- title: 'Revisiting Character-level Adversarial Attacks for Language Models'
abstract: 'Adversarial attacks in Natural Language Processing apply perturbations in the character or token levels. Token-level attacks, gaining prominence for their use of gradient-based methods, are susceptible to altering sentence semantics, leading to invalid adversarial examples. While character-level attacks easily maintain semantics, they have received less attention as they cannot easily adopt popular gradient-based methods, and are thought to be easy to defend. Challenging these beliefs, we introduce Charmer, an efficient query-based adversarial attack capable of achieving high attack success rate (ASR) while generating highly similar adversarial examples. Our method successfully targets both small (BERT) and large (Llama 2) models. Specifically, on BERT with SST-2, Charmer improves the ASR in $4.84$% points and the USE similarity in $8$% points with respect to the previous art. Our implementation is available in https://github.com/LIONS-EPFL/Charmer.'
volume: 235
URL: https://proceedings.mlr.press/v235/abad-rocamora24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/abad-rocamora24a/abad-rocamora24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-abad-rocamora24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Elias
family: Abad Rocamora
- given: Yongtao
family: Wu
- given: Fanghui
family: Liu
- given: Grigorios
family: Chrysos
- given: Volkan
family: Cevher
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1-30
id: abad-rocamora24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1
lastpage: 30
published: 2024-07-08 00:00:00 +0000
- title: 'Adaptively Perturbed Mirror Descent for Learning in Games'
abstract: 'This paper proposes a payoff perturbation technique for the Mirror Descent (MD) algorithm in games where the gradient of the payoff functions is monotone in the strategy profile space, potentially containing additive noise. The optimistic family of learning algorithms, exemplified by optimistic MD, successfully achieves *last-iterate* convergence in scenarios devoid of noise, leading the dynamics to a Nash equilibrium. A recent re-emerging trend underscores the promise of the perturbation approach, where payoff functions are perturbed based on the distance from an anchoring, or *slingshot*, strategy. In response, we propose *Adaptively Perturbed MD* (APMD), which adjusts the magnitude of the perturbation by repeatedly updating the slingshot strategy at a predefined interval. This innovation empowers us to find a Nash equilibrium of the underlying game with guaranteed rates. Empirical demonstrations affirm that our algorithm exhibits significantly accelerated convergence.'
volume: 235
URL: https://proceedings.mlr.press/v235/abe24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/abe24a/abe24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-abe24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Kenshi
family: Abe
- given: Kaito
family: Ariu
- given: Mitsuki
family: Sakamoto
- given: Atsushi
family: Iwasaki
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 31-80
id: abe24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 31
lastpage: 80
published: 2024-07-08 00:00:00 +0000
- title: 'InferCept: Efficient Intercept Support for Augmented Large Language Model Inference'
abstract: 'Large language models are increasingly integrated with external environments, tools, and agents like ChatGPT plugins to extend their capability beyond language-centric tasks. However, today’s LLM inference systems are designed for standalone LLMs. They treat each external interaction as the end of LLM generation and form a new request when the interaction finishes, causing unnecessary recomputation of already computed contexts, which accounts for 37-40% of total model forwarding time. This paper presents **InferCept, the first LLM inference framework targeting augmented LLMs** and supporting the efficient interception of LLM generation. InferCept minimizes the GPU resource waste caused by LLM interceptions and dedicates saved memory for serving more requests.InferCept improves the overall serving throughput by **1.6x-2x** and completes 2x more requests per second compared to the state-of-the-art LLM inference systems.'
volume: 235
URL: https://proceedings.mlr.press/v235/abhyankar24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/abhyankar24a/abhyankar24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-abhyankar24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Reyna
family: Abhyankar
- given: Zijian
family: He
- given: Vikranth
family: Srivatsa
- given: Hao
family: Zhang
- given: Yiying
family: Zhang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 81-95
id: abhyankar24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 81
lastpage: 95
published: 2024-07-08 00:00:00 +0000
- title: 'Balancing Feature Similarity and Label Variability for Optimal Size-Aware One-shot Subset Selection'
abstract: 'Subset or core-set selection offers a data-efficient way for training deep learning models. One-shot subset selection poses additional challenges as subset selection is only performed once and full set data become unavailable after the selection. However, most existing methods tend to choose either diverse or difficult data samples, which fail to faithfully represent the joint data distribution that is comprised of both feature and label information. The selection is also performed independently from the subset size, which plays an essential role in choosing what types of samples. To address this critical gap, we propose to conduct Feature similarity and Label variability Balanced One-shot Subset Selection (BOSS), aiming to construct an optimal size-aware subset for data-efficient deep learning. We show that a novel balanced core-set loss bound theoretically justifies the need to simultaneously consider both diversity and difficulty to form an optimal subset. It also reveals how the subset size influences the bound. We further connect the inaccessible bound to a practical surrogate target which is tailored to subset sizes and varying levels of overall difficulty. We design a novel Beta-scoring importance function to delicately control the optimal balance of diversity and difficulty. Comprehensive experiments conducted on both synthetic and real data justify the important theoretical properties and demonstrate the superior performance of BOSS as compared with the competitive baselines.'
volume: 235
URL: https://proceedings.mlr.press/v235/acharya24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/acharya24a/acharya24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-acharya24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Abhinab
family: Acharya
- given: Dayou
family: Yu
- given: Qi
family: Yu
- given: Xumin
family: Liu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 96-116
id: acharya24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 96
lastpage: 116
published: 2024-07-08 00:00:00 +0000
- title: 'Bayesian Uncertainty for Gradient Aggregation in Multi-Task Learning'
abstract: 'As machine learning becomes more prominent there is a growing demand to perform several inference tasks in parallel. Multi-task learning (MTL) addresses this challenge by learning a single model that solves several tasks simultaneously and efficiently. Often optimizing MTL models entails first computing the gradient of the loss for each task, and then aggregating all the gradients to obtain a combined update direction. However, common methods following this approach do not consider an important aspect, the sensitivity in the dimensions of the gradients. Some dimensions may be more lenient for changes while others may be more restrictive. Here, we introduce a novel gradient aggregation procedure using Bayesian inference. We place a probability distribution over the task-specific parameters, which in turn induce a *distribution* over the gradients of the tasks. This valuable information allows us to quantify the uncertainty associated with each of the gradients’ dimensions which is factored in when aggregating them. We empirically demonstrate the benefits of our approach in a variety of datasets, achieving state-of-the-art performance.'
volume: 235
URL: https://proceedings.mlr.press/v235/achituve24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/achituve24a/achituve24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-achituve24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Idan
family: Achituve
- given: Idit
family: Diamant
- given: Arnon
family: Netzer
- given: Gal
family: Chechik
- given: Ethan
family: Fetaya
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 117-134
id: achituve24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 117
lastpage: 134
published: 2024-07-08 00:00:00 +0000
- title: 'AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers'
abstract: 'Large Language Models are prone to biased predictions and hallucinations, underlining the paramount importance of understanding their model-internal reasoning process. However, achieving faithful attributions for the entirety of a black-box transformer model and maintaining computational efficiency is an unsolved challenge. By extending the Layer-wise Relevance Propagation attribution method to handle attention layers, we address these challenges effectively. While partial solutions exist, our method is the first to faithfully and holistically attribute not only input but also latent representations of transformer models with the computational efficiency similar to a single backward pass. Through extensive evaluations against existing methods on LLaMa 2, Mixtral 8x7b, Flan-T5 and vision transformer architectures, we demonstrate that our proposed approach surpasses alternative methods in terms of faithfulness and enables the understanding of latent representations, opening up the door for concept-based explanations. We provide an LRP library at https://github.com/rachtibat/LRP-eXplains-Transformers.'
volume: 235
URL: https://proceedings.mlr.press/v235/achtibat24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/achtibat24a/achtibat24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-achtibat24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Reduan
family: Achtibat
- given: Sayed Mohammad Vakilzadeh
family: Hatefi
- given: Maximilian
family: Dreyer
- given: Aakriti
family: Jain
- given: Thomas
family: Wiegand
- given: Sebastian
family: Lapuschkin
- given: Wojciech
family: Samek
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 135-168
id: achtibat24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 135
lastpage: 168
published: 2024-07-08 00:00:00 +0000
- title: 'A Unified Framework for Learning with Nonlinear Model Classes from Arbitrary Linear Samples'
abstract: 'This work considers the fundamental problem of learning an unknown object from training data using a given model class. We introduce a framework that allows for objects in arbitrary Hilbert spaces, general types of (random) linear measurements as training data and general types of nonlinear model classes. We establish a series of learning guarantees for this framework, which provide explicit relations between the amount of training data and the model class to ensure near-best generalization bounds. In doing so, we introduce the key notion of the *variation* of a model class with respect to a distribution of sampling operators. We show that this framework can accommodate many different types of well-known problems of interest, such as matrix sketching by random sampling, compressed sensing with isotropic vectors, active learning in regression and compressed sensing with generative models. In all cases, known results become straightforward corollaries of our general theory. Hence, this work provides a powerful framework for studying and analyzing many different types of learning problems.'
volume: 235
URL: https://proceedings.mlr.press/v235/adcock24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/adcock24a/adcock24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-adcock24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ben
family: Adcock
- given: Juan M.
family: Cardenas
- given: Nick
family: Dexter
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 169-202
id: adcock24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 169
lastpage: 202
published: 2024-07-08 00:00:00 +0000
- title: 'FrameQuant: Flexible Low-Bit Quantization for Transformers'
abstract: 'Transformers are the backbone of powerful foundation models for many Vision and Natural Language Processing tasks. But their compute and memory/storage footprint is large, and so, serving such models is expensive often requiring high-end hardware. To mitigate this difficulty, Post-Training Quantization seeks to modify a pre-trained model and quantize it to eight bits or lower, significantly boosting compute/memory/latency efficiency. Such models have been successfully quantized to four bits with some performance loss. In this work, we outline a simple scheme to quantize Transformer-based models to just two bits (plus some overhead) with only a small drop in accuracy. Key to our formulation is a concept borrowed from Harmonic analysis called Fusion Frames. Our main finding is that the quantization must take place not in the original weight space, but instead in the Fusion Frame representations. If quantization is interpreted as the addition of noise, our casting of the problem allows invoking an extensive body of known consistent recovery and noise robustness guarantees. Further, if desired, de-noising filters are known in closed form. We show empirically, via a variety of experiments, that (almost) two-bit quantization for Transformer models promises sizable efficiency gains. The code is available at https://github.com/vsingh-group/FrameQuant'
volume: 235
URL: https://proceedings.mlr.press/v235/adepu24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/adepu24a/adepu24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-adepu24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Harshavardhan
family: Adepu
- given: Zhanpeng
family: Zeng
- given: Li
family: Zhang
- given: Vikas
family: Singh
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 203-227
id: adepu24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 203
lastpage: 227
published: 2024-07-08 00:00:00 +0000
- title: 'BeigeMaps: Behavioral Eigenmaps for Reinforcement Learning from Images'
abstract: 'Training reinforcement learning (RL) agents directly from high-dimensional image observations continues to be a challenging problem. Recent line of work on behavioral distances proposes to learn representations that encode behavioral similarities quantified by the bisimulation metric. By learning an isometric mapping to a lower dimensional space that preserves this metric, such methods attempt to learn representations that group together functionally similar states. However, such an isometric mapping may not exist, making the learning objective ill-defined. We propose an alternative objective that allows distortions in long-range distances, while preserving *local* metric structure – inducing representations that highlight natural clusters in the state space. This leads to new representations, which we term Behavioral Eigenmaps (BeigeMaps), corresponding to the eigenfunctions of similarity kernels induced by behavioral distances. We empirically demonstrate that when added as a drop-in modification, BeigeMaps improve the policy performance of prior behavioral distance based RL algorithms.'
volume: 235
URL: https://proceedings.mlr.press/v235/adhikary24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/adhikary24a/adhikary24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-adhikary24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sandesh
family: Adhikary
- given: Anqi
family: Li
- given: Byron
family: Boots
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 228-245
id: adhikary24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 228
lastpage: 245
published: 2024-07-08 00:00:00 +0000
- title: 'Discovering Bias in Latent Space: An Unsupervised Debiasing Approach'
abstract: 'The question-answering (QA) capabilities of foundation models are highly sensitive to prompt variations, rendering their performance susceptible to superficial, non-meaning-altering changes. This vulnerability often stems from the model’s preference or bias towards specific input characteristics, such as option position or superficial image features in multi-modal settings. We propose to rectify this bias directly in the model’s internal representation. Our approach, SteerFair, finds the bias direction in the model’s representation space and steers activation values away from it during inference. Specifically, we exploit the observation that bias often adheres to simple association rules, such as the spurious association between the first option and correctness likelihood. Next, we construct demonstrations of these rules from unlabeled samples and use them to identify the bias directions. We empirically show that SteerFair significantly reduces instruction-tuned model performance variance across prompt modifications on three benchmark tasks. Remarkably, our approach surpasses a supervised baseline with 100 labels by an average of 10.86% accuracy points and 12.95 score points and matches the performance with 500 labels.'
volume: 235
URL: https://proceedings.mlr.press/v235/adila24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/adila24a/adila24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-adila24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Dyah
family: Adila
- given: Shuai
family: Zhang
- given: Boran
family: Han
- given: Bernie
family: Wang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 246-261
id: adila24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 246
lastpage: 261
published: 2024-07-08 00:00:00 +0000
- title: 'Optimal Coresets for Low-Dimensional Geometric Median'
abstract: 'We investigate coresets for approximating the cost with respect to median queries. In this problem, we are given a set of points $P\subset \mathbb{R}^d$ and median queries are $\sum_{p\in P} ||p-c||$ for any point $c\in \mathbb{R}^d$. Our goal is to compute a small weighted summary $S\subset P$ such that the cost of any median query is approximated within a multiplicative $(1\pm\varepsilon)$ factor. We provide matching upper and lower bounds on the number of points contained in $S$ of the order $\tilde{\Theta}\left(\varepsilon^{-d/(d+1)}\right)$.'
volume: 235
URL: https://proceedings.mlr.press/v235/afshani24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/afshani24a/afshani24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-afshani24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Peyman
family: Afshani
- given: Chris
family: Schwiegelshohn
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 262-270
id: afshani24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 262
lastpage: 270
published: 2024-07-08 00:00:00 +0000
- title: 'REST: Efficient and Accelerated EEG Seizure Analysis through Residual State Updates'
abstract: 'EEG-based seizure detection models face challenges in terms of inference speed and memory efficiency, limiting their real-time implementation in clinical devices. This paper introduces a novel graph-based residual state update mechanism (REST) for real-time EEG signal analysis in applications such as epileptic seizure detection. By leveraging a combination of graph neural networks and recurrent structures, REST efficiently captures both non-Euclidean geometry and temporal dependencies within EEG data. Our model demonstrates high accuracy in both seizure detection and classification tasks. Notably, REST achieves a remarkable 9-fold acceleration in inference speed compared to state-of-the-art models, while simultaneously demanding substantially less memory than the smallest model employed for this task. These attributes position REST as a promising candidate for real-time implementation in clinical devices, such as Responsive Neurostimulation or seizure alert systems.'
volume: 235
URL: https://proceedings.mlr.press/v235/afzal24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/afzal24a/afzal24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-afzal24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Arshia
family: Afzal
- given: Grigorios
family: Chrysos
- given: Volkan
family: Cevher
- given: Mahsa
family: Shoaran
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 271-290
id: afzal24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 271
lastpage: 290
published: 2024-07-08 00:00:00 +0000
- title: 'CHAI: Clustered Head Attention for Efficient LLM Inference'
abstract: 'Large Language Models (LLMs) with hundreds of billions of parameters have transformed the field of machine learning. However, serving these models at inference time is both compute and memory intensive, where a single request can require multiple GPUs and tens of Gigabytes of memory. Multi-head attention is one of the key components of LLMs, which can for over 50% of LLMs memory and compute requirement. We observe that there is a high amount of redundancy across heads on which tokens they pay attention to. Based on this insight, we propose Clustered HeadAttention ( CHAI ). CHAI combines heads with a high amount of correlation for self-attention at runtime, thus reducing both memory and compute. In our experiments, we show that CHAI is able to reduce the memory requirements for storing K,V cache by up to 21.4% and inference time latency by up to 1.73× without any fine-tuning required. CHAI achieves this with a maximum 3.2% deviation in accuracy across 3 different models (i.e. OPT-66B, LLAMA-7B, LLAMA-33B) and 5 different evaluation datasets.'
volume: 235
URL: https://proceedings.mlr.press/v235/agarwal24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/agarwal24a/agarwal24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-agarwal24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Saurabh
family: Agarwal
- given: Bilge
family: Acun
- given: Basil
family: Hosmer
- given: Mostafa
family: Elhoushi
- given: Yejin
family: Lee
- given: Shivaram
family: Venkataraman
- given: Dimitris
family: Papailiopoulos
- given: Carole-Jean
family: Wu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 291-312
id: agarwal24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 291
lastpage: 312
published: 2024-07-08 00:00:00 +0000
- title: 'Learning to Play Atari in a World of Tokens'
abstract: 'Model-based reinforcement learning agents utilizing transformers have shown improved sample efficiency due to their ability to model extended context, resulting in more accurate world models. However, for complex reasoning and planning tasks, these methods primarily rely on continuous representations. This complicates modeling of discrete properties of the real world such as disjoint object classes between which interpolation is not plausible. In this work, we introduce discrete abstract representations for transformer-based learning (DART), a sample-efficient method utilizing discrete representations for modeling both the world and learning behavior. We incorporate a transformer-decoder for auto-regressive world modeling and a transformer-encoder for learning behavior by attending to task-relevant cues in the discrete representation of the world model. For handling partial observability, we aggregate information from past time steps as memory tokens. DART outperforms previous state-of-the-art methods that do not use look-ahead search on the Atari 100k sample efficiency benchmark with a median human-normalized score of 0.790 and beats humans in 9 out of 26 games. We release our code at https://pranaval.github.io/DART/.'
volume: 235
URL: https://proceedings.mlr.press/v235/agarwal24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/agarwal24b/agarwal24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-agarwal24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Pranav
family: Agarwal
- given: Sheldon
family: Andrews
- given: Samira Ebrahimi
family: Kahou
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 313-328
id: agarwal24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 313
lastpage: 328
published: 2024-07-08 00:00:00 +0000
- title: 'Probabilistic Generating Circuits - Demystified'
abstract: 'Zhang et al. (ICML 2021, PLMR 139, pp. 12447–12457) introduced probabilistic generating circuits (PGCs) as a probabilistic model to unify probabilistic circuits (PCs) and determinantal point processes (DPPs). At a first glance, PGCs store a distribution in a very different way, they compute the probability generating polynomial instead of the probability mass function and it seems that this is the main reason why PGCs are more powerful than PCs or DPPs. However, PGCs also allow for negative weights, whereas classical PCs assume that all weights are nonnegative. One main insight of this work is that the negative weights are the cause for the power of PGCs and not the different representation. PGCs are PCs in disguise: we show how to transform any PGC on binary variables into a PC with negative weights with only polynomial blowup. PGCs were defined by Zhang et al. only for binary random variables. As our second main result, we show that there is a good reason for this: we prove that PGCs for categorical variables with larger image size do not support tractable marginalization unless NP=P. On the other hand, we show that we can model categorical variables with larger image size as PC with negative weights computing set-multilinear polynomials. These allow for tractable marginalization. In this sense, PCs with negative weights strictly subsume PGCs.'
volume: 235
URL: https://proceedings.mlr.press/v235/agarwal24c.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/agarwal24c/agarwal24c.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-agarwal24c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sanyam
family: Agarwal
- given: Markus
family: Bläser
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 329-342
id: agarwal24c
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 329
lastpage: 342
published: 2024-07-08 00:00:00 +0000
- title: 'Improved Differentially Private and Lazy Online Convex Optimization: Lower Regret without Smoothness Requirements'
abstract: 'We design differentially private regret-minimizing algorithms in the online convex optimization (OCO) framework. Unlike recent results, our algorithms and analyses do not require smoothness, thus yielding the first private regret bounds with an optimal leading-order term for non-smooth loss functions. Additionally, even for smooth losses, the resulting regret guarantees improve upon previous results in terms their dependence of dimension. Our results provide the best known rates for DP-OCO in all practical regimes of the privacy parameter, barring when it is exceptionally small. The principal innovation in our algorithm design is the use of sampling from strongly log-concave densities which satisfy the Log-Sobolev Inequality. The resulting concentration of measure allows us to obtain a better trade-off for the dimension factors than prior work, leading to improved results. Following previous works on DP-OCO, the proposed algorithm explicitly limits the number of switches via rejection sampling. Thus, independently of privacy constraints, the algorithm also provides improved results for online convex optimization with a switching budget.'
volume: 235
URL: https://proceedings.mlr.press/v235/agarwal24d.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/agarwal24d/agarwal24d.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-agarwal24d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Naman
family: Agarwal
- given: Satyen
family: Kale
- given: Karan
family: Singh
- given: Abhradeep
family: Guha Thakurta
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 343-361
id: agarwal24d
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 343
lastpage: 361
published: 2024-07-08 00:00:00 +0000
- title: 'The Non-linear $F$-Design and Applications to Interactive Learning'
abstract: 'We propose a generalization of the classical G-optimal design concept to non-linear function classes. The criterion, termed F -design, coincides with G-design in the linear case. We compute the value of the optimal design, termed the F-condition number, for several non-linear function classes. We further provide algorithms to construct designs with a bounded F -condition number. Finally, we employ the F-design in a variety of interactive machine learning tasks, where the design is naturally useful for data collection or exploration. We show that in four diverse settings of confidence band construction, contextual bandits, model-free reinforcement learning, and active learning, F-design can be combined with existing approaches in a black-box manner to yield state-of-the-art results in known problem settings as well as to generalize to novel ones.'
volume: 235
URL: https://proceedings.mlr.press/v235/agarwal24e.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/agarwal24e/agarwal24e.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-agarwal24e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alekh
family: Agarwal
- given: Jian
family: Qian
- given: Alexander
family: Rakhlin
- given: Tong
family: Zhang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 362-396
id: agarwal24e
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 362
lastpage: 396
published: 2024-07-08 00:00:00 +0000
- title: 'ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints'
abstract: 'Reinforcement Learning (RL) for constrained MDPs (CMDPs) is an increasingly important problem for various applications. Often, the average criterion is more suitable than the discounted criterion. Yet, RL for average-CMDPs (ACMDPs) remains a challenging problem. Algorithms designed for discounted constrained RL problems often do not perform well for the average CMDP setting. In this paper, we introduce a new policy optimization with function approximation algorithm for constrained MDPs with the average criterion. The Average-Constrained Policy Optimization (ACPO) algorithm is inspired by trust region-based policy optimization algorithms. We develop basic sensitivity theory for average CMDPs, and then use the corresponding bounds in the design of the algorithm. We provide theoretical guarantees on its performance, and through extensive experimental work in various challenging OpenAI Gym environments, show its superior empirical performance when compared to other state-of-the-art algorithms adapted for the ACMDPs.'
volume: 235
URL: https://proceedings.mlr.press/v235/agnihotri24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/agnihotri24a/agnihotri24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-agnihotri24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Akhil
family: Agnihotri
- given: Rahul
family: Jain
- given: Haipeng
family: Luo
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 397-415
id: agnihotri24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 397
lastpage: 415
published: 2024-07-08 00:00:00 +0000
- title: 'CosPGD: an efficient white-box adversarial attack for pixel-wise prediction tasks'
abstract: 'While neural networks allow highly accurate predictions in many tasks, their lack of robustness towards even slight input perturbations often hampers their deployment. Adversarial attacks such as the seminal *projected gradient descent* (PGD) offer an effective means to evaluate a model’s robustness and dedicated solutions have been proposed for attacks on semantic segmentation or optical flow estimation. While they attempt to increase the attack’s efficiency, a further objective is to balance its effect, so that it acts on the entire image domain instead of isolated point-wise predictions. This often comes at the cost of optimization stability and thus efficiency. Here, we propose CosPGD, an attack that encourages more balanced errors over the entire image domain while increasing the attack’s overall efficiency. To this end, CosPGD leverages a simple alignment score computed from any pixel-wise prediction and its target to scale the loss in a smooth and fully differentiable way. It leads to efficient evaluations of a model’s robustness for semantic segmentation as well as regression models (such as optical flow, disparity estimation, or image restoration), and it allows it to outperform the previous SotA attack on semantic segmentation. We provide code for the CosPGD algorithm and example usage at https://github.com/shashankskagnihotri/cospgd.'
volume: 235
URL: https://proceedings.mlr.press/v235/agnihotri24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/agnihotri24b/agnihotri24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-agnihotri24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Shashank
family: Agnihotri
- given: Steffen
family: Jung
- given: Margret
family: Keuper
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 416-451
id: agnihotri24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 416
lastpage: 451
published: 2024-07-08 00:00:00 +0000
- title: 'LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned Proportions'
abstract: 'A promising approach to preserving model performance in linearized transformers is to employ position-based re-weighting functions. However, state-of-the-art re-weighting functions rely heavily on target sequence lengths, making it difficult or impossible to apply them to autoregressive and simultaneous tasks, where the target and sometimes even the input sequence length are unknown. To address this issue, we propose Learned Proportions (LeaP) and LeaPformers. Our contribution is built on two major components. First, we generalize the dependence on explicit positional representations and sequence lengths into dependence on sequence proportions for re-weighting. Second, we replace static positional representations with dynamic proportions derived via a compact module, enabling more flexible attention concentration patterns. We evaluate LeaPformer against eight representative efficient transformers on the Long-Range Arena benchmark, where we show that LeaPformer achieves the best quality-throughput trade-off, as well as apply LeaPformer to Wikitext-103b autoregressive language modeling and simultaneous speech-to-text translation for two language pairs, achieving competitive results in both tasks.'
volume: 235
URL: https://proceedings.mlr.press/v235/agostinelli-iii24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/agostinelli-iii24a/agostinelli-iii24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-agostinelli-iii24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Victor
family: Agostinelli Iii
- given: Sanghyun
family: Hong
- given: Lizhong
family: Chen
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 452-470
id: agostinelli-iii24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 452
lastpage: 470
published: 2024-07-08 00:00:00 +0000
- title: 'Policy Evaluation for Variance in Average Reward Reinforcement Learning'
abstract: 'We consider an average reward reinforcement learning (RL) problem and work with asymptotic variance as a risk measure to model safety-critical applications. We design a temporal-difference (TD) type algorithm tailored for policy evaluation in this context. Our algorithm is based on linear stochastic approximation of an equivalent formulation of the asymptotic variance in terms of the solution of the Poisson equation. We consider both the tabular and linear function approximation settings, and establish $\tilde {O}(1/k)$ finite time convergence rate, where $k$ is the number of steps of the algorithm. Our work paves the way for developing actor-critic style algorithms for variance-constrained RL. To the best of our knowledge, our result provides the first sequential estimator for asymptotic variance of a Markov chain with provable finite sample guarantees, which is of independent interest.'
volume: 235
URL: https://proceedings.mlr.press/v235/agrawal24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/agrawal24a/agrawal24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-agrawal24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Shubhada
family: Agrawal
- given: Prashanth L
family: A
- given: Siva Theja
family: Maguluri
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 471-502
id: agrawal24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 471
lastpage: 502
published: 2024-07-08 00:00:00 +0000
- title: 'Distinguishing the Knowable from the Unknowable with Language Models'
abstract: 'We study the feasibility of identifying *epistemic* uncertainty (reflecting a lack of knowledge), as opposed to *aleatoric* uncertainty (reflecting entropy in the underlying distribution), in the outputs of large language models (LLMs) over free-form text. In the absence of ground-truth probabilities, we explore a setting where, in order to (approximately) disentangle a given LLM’s uncertainty, a significantly larger model stands in as a proxy for the ground truth. We show that small linear probes trained on the embeddings of frozen, pretrained models accurately predict when larger models will be more confident at the token level and that probes trained on one text domain generalize to others. Going further, we propose a fully unsupervised method that achieves non-trivial accuracy on the same task. Taken together, we interpret these results as evidence that LLMs naturally contain internal representations of different types of uncertainty that could potentially be leveraged to devise more informative indicators of model confidence in diverse practical settings. Code can be found at: https://github.com/KempnerInstitute/llm_uncertainty'
volume: 235
URL: https://proceedings.mlr.press/v235/ahdritz24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/ahdritz24a/ahdritz24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-ahdritz24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Gustaf
family: Ahdritz
- given: Tian
family: Qin
- given: Nikhil
family: Vyas
- given: Boaz
family: Barak
- given: Benjamin L.
family: Edelman
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 503-549
id: ahdritz24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 503
lastpage: 549
published: 2024-07-08 00:00:00 +0000
- title: 'Unmasking Vulnerabilities: Cardinality Sketches under Adaptive Inputs'
abstract: 'Cardinality sketches are popular data structures that enhance the efficiency of working with large data sets. The sketches are randomized representations of sets that are only of logarithmic size but can support set merges and approximate cardinality (i.e., distinct count) queries. When queries are not adaptive, that is, they do not depend on preceding query responses, the design provides strong guarantees of correctly answering a number of queries exponential in the sketch size $k$. In this work, we investigate the performance of cardinality sketches in adaptive settings and unveil inherent vulnerabilities. We design an attack against the “standard” estimators that constructs an adversarial input by post-processing responses to a set of simple non-adaptive queries of size linear in the sketch size $k$. Empirically, our attack used only $4k$ queries with the widely used HyperLogLog (HLL++) Flajolet et al., 2007; Heule et al., 2013) sketch. The simple attack technique suggests it can be effective with post-processed natural workloads. Finally and importantly, we demonstrate that the vulnerability is inherent as any estimator applied to known sketch structures can be attacked using a number of queries that is quadratic in $k$, matching a generic upper bound.'
volume: 235
URL: https://proceedings.mlr.press/v235/ahmadian24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/ahmadian24a/ahmadian24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-ahmadian24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sara
family: Ahmadian
- given: Edith
family: Cohen
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 550-576
id: ahmadian24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 550
lastpage: 576
published: 2024-07-08 00:00:00 +0000
- title: 'OptiMUS: Scalable Optimization Modeling with (MI)LP Solvers and Large Language Models'
abstract: 'Optimization problems are pervasive in sectors from manufacturing and distribution to healthcare. However, most such problems are still solved heuristically by hand rather than optimally by state-of-the-art solvers because the expertise required to formulate and solve these problems limits the widespread adoption of optimization tools and techniques. This paper introduces OptiMUS, a Large Language Model (LLM)-based agent designed to formulate and solve (mixed integer) linear programming problems from their natural language descriptions. OptiMUS can develop mathematical models, write and debug solver code, evaluate the generated solutions, and improve its model and code based on these evaluations. OptiMUS utilizes a modular structure to process problems, allowing it to handle problems with long descriptions and complex data without long prompts. Experiments demonstrate that OptiMUS outperforms existing state-of-the-art methods on easy datasets by more than $20$% and on hard datasets (including a new dataset, NLP4LP, released with this paper that features long and complex problems) by more than $30$%. The implementation and the datasets are available at https://github.com/teshnizi/OptiMUS.'
volume: 235
URL: https://proceedings.mlr.press/v235/ahmaditeshnizi24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/ahmaditeshnizi24a/ahmaditeshnizi24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-ahmaditeshnizi24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ali
family: Ahmaditeshnizi
- given: Wenzhi
family: Gao
- given: Madeleine
family: Udell
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 577-596
id: ahmaditeshnizi24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 577
lastpage: 596
published: 2024-07-08 00:00:00 +0000
- title: 'How to Escape Sharp Minima with Random Perturbations'
abstract: 'Modern machine learning applications have witnessed the remarkable success of optimization algorithms that are designed to find flat minima. Motivated by this design choice, we undertake a formal study that (i) formulates the notion of flat minima, and (ii) studies the complexity of finding them. Specifically, we adopt the trace of the Hessian of the cost function as a measure of flatness, and use it to formally define the notion of approximate flat minima. Under this notion, we then analyze algorithms that find approximate flat minima efficiently. For general cost functions, we discuss a gradient-based algorithm that finds an approximate flat local minimum efficiently. The main component of the algorithm is to use gradients computed from randomly perturbed iterates to estimate a direction that leads to flatter minima. For the setting where the cost function is an empirical risk over training data, we present a faster algorithm that is inspired by a recently proposed practical algorithm called sharpness-aware minimization, supporting its success in practice.'
volume: 235
URL: https://proceedings.mlr.press/v235/ahn24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/ahn24a/ahn24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-ahn24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Kwangjun
family: Ahn
- given: Ali
family: Jadbabaie
- given: Suvrit
family: Sra
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 597-618
id: ahn24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 597
lastpage: 618
published: 2024-07-08 00:00:00 +0000
- title: 'Understanding Adam Optimizer via Online Learning of Updates: Adam is FTRL in Disguise'
abstract: 'Despite the success of the Adam optimizer in practice, the theoretical understanding of its algorithmic components still remains limited. In particular, most existing analyses of Adam show the convergence rate that can be simply achieved by non-adative algorithms like SGD. In this work, we provide a different perspective based on online learning that underscores the importance of Adam’s algorithmic components. Inspired by Cutkosky et al. (2023), we consider the framework called online learning of updates/increments, where we choose the updates/increments of an optimizer based on an online learner. With this framework, the design of a good optimizer is reduced to the design of a good online learner. Our main observation is that Adam corresponds to a principled online learning framework called Follow-the-Regularized-Leader (FTRL). Building on this observation, we study the benefits of its algorithmic components from the online learning perspective.'
volume: 235
URL: https://proceedings.mlr.press/v235/ahn24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/ahn24b/ahn24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-ahn24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Kwangjun
family: Ahn
- given: Zhiyu
family: Zhang
- given: Yunbum
family: Kook
- given: Yan
family: Dai
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 619-640
id: ahn24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 619
lastpage: 640
published: 2024-07-08 00:00:00 +0000
- title: 'Not all distributional shifts are equal: Fine-grained robust conformal inference'
abstract: 'We introduce a fine-grained framework for uncertainty quantification of predictive models under distributional shifts. This framework distinguishes the shift in covariate distributions from that in the conditional relationship between the outcome ($Y$) and the covariates ($X$). We propose to reweight the training samples to adjust for an identifiable shift in covariate distribution while protecting against the worst-case conditional distribution shift bounded in an $f$-divergence ball. Based on ideas from conformal inference and distributionally robust learning, we present an algorithm that outputs (approximately) valid and efficient prediction intervals in the presence of distributional shifts. As a use case, we apply the framework to sensitivity analysis of individual treatment effects with hidden confounding. The proposed methods are evaluated in simulations and four real data applications, demonstrating superior robustness and efficiency compared with existing benchmarks.'
volume: 235
URL: https://proceedings.mlr.press/v235/ai24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/ai24a/ai24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-ai24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jiahao
family: Ai
- given: Zhimei
family: Ren
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 641-665
id: ai24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 641
lastpage: 665
published: 2024-07-08 00:00:00 +0000
- title: 'Triple Changes Estimator for Targeted Policies'
abstract: 'The renowned difference-in-differences (DiD) estimator relies on the assumption of ’parallel trends,’ which may not hold in many practical applications. To address this issue, economists are increasingly considering the triple difference estimator as a more credible alternative. Both DiD and triple difference are limited to assessing average effects exclusively. An alternative avenue is offered by the changes-in-changes (CiC) estimator, which provides an estimate of the entire counterfactual distribution by relying on assumptions imposed on the distribution of potential outcomes. In this work, we extend the triple difference estimator to accommodate the CiC framework, presenting the ‘triple changes estimator’ and its identification assumptions, thereby expanding the scope of the CiC paradigm. Subsequently, we empirically evaluate the proposed framework and apply it to a study examining the impact of Medicaid expansion on children’s preventive care.'
volume: 235
URL: https://proceedings.mlr.press/v235/akbari24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/akbari24a/akbari24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-akbari24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sina
family: Akbari
- given: Negar
family: Kiyavash
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 666-695
id: akbari24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 666
lastpage: 695
published: 2024-07-08 00:00:00 +0000
- title: 'Improving Computational Complexity in Statistical Models with Local Curvature Information'
abstract: 'It is known that when the statistical models are singular, i.e., the Fisher information matrix at the true parameter is degenerate, the fixed step-size gradient descent algorithm takes polynomial number of steps in terms of the sample size $n$ to converge to a final statistical radius around the true parameter, which can be unsatisfactory for the practical application. To further improve that computational complexity, we consider utilizing the local curvature information for parameter estimation. Even though there is a rich literature in using the local curvature information for optimization, the statistical rate of these methods in statistical models, to the best of our knowledge, has not been studied rigorously. The major challenge of this problem is due to the non-convex nature of sample loss function. To shed light on these problems, we specifically study the normalized gradient descent (NormGD) algorithm, a variant of gradient descent algorithm whose step size is scaled by the maximum eigenvalue of the Hessian matrix of the empirical loss function, and deal with the aforementioned issue with a population-to-sample analysis. When the population loss function is homogeneous, the NormGD iterates reach a final statistical radius around the true parameter after a logarithmic number of iterations in terms of $n$. Therefore, for fixed dimension $d$, the NormGD algorithm achieves the optimal computational complexity $\mathcal{O}(n)$ to reach the final statistical radius, which is cheaper than the complexity $\mathcal{O}(n^{\tau})$ of the fixed step-size gradient descent algorithm for some $\tau > 1$.'
volume: 235
URL: https://proceedings.mlr.press/v235/akbarian24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/akbarian24a/akbarian24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-akbarian24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Pedram
family: Akbarian
- given: Tongzheng
family: Ren
- given: Jiacheng
family: Zhuo
- given: Sujay
family: Sanghavi
- given: Nhat
family: Ho
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 696-719
id: akbarian24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 696
lastpage: 719
published: 2024-07-08 00:00:00 +0000
- title: 'Learning Mixtures of Gaussian Processes through Random Projection'
abstract: 'We propose an ensemble clustering framework to uncover latent cluster labels in functional data generated from a Gaussian process mixture. Our method exploits the fact that the projection coefficients of the functional data onto any given projection function follow a univariate Gaussian mixture model (GMM). By conducting multiple one-dimensional projections and learning a univariate GMM for each, we create an ensemble of GMMs. Each GMM serves as a base clustering, and applying ensemble clustering yields a consensus clustering. Our approach significantly reduces computational complexity compared to state-of-the-art methods, and we provide theoretical guarantees on the identifiability and learnability of Gaussian process mixtures. Extensive experiments on synthetic and real datasets confirm the superiority of our method over existing techniques.'
volume: 235
URL: https://proceedings.mlr.press/v235/akeweje24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/akeweje24a/akeweje24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-akeweje24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Emmanuel
family: Akeweje
- given: Mimi
family: Zhang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 720-739
id: akeweje24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 720
lastpage: 739
published: 2024-07-08 00:00:00 +0000
- title: 'Encodings for Prediction-based Neural Architecture Search'
abstract: 'Predictor-based methods have substantially enhanced Neural Architecture Search (NAS) optimization. The efficacy of these predictors is largely influenced by the method of encoding neural network architectures. While traditional encodings used an adjacency matrix describing the graph structure of a neural network, novel encodings embrace a variety of approaches from unsupervised pretraining of latent representations to vectors of zero-cost proxies. In this paper, we categorize and investigate neural encodings from three main types: structural, learned, and score-based. Furthermore, we extend these encodings and introduce *unified encodings*, that extend NAS predictors to multiple search spaces. Our analysis draws from experiments conducted on over 1.5 million neural network architectures on NAS spaces such as NASBench-101 (NB101), NB201, NB301, Network Design Spaces (NDS), and TransNASBench-101. Building on our study, we present our predictor **FLAN**: **Fl**ow **A**ttention for **N**AS. FLAN integrates critical insights on predictor design, transfer learning, and *unified encodings* to enable more than an order of magnitude cost reduction for training NAS accuracy predictors. Our implementation and encodings for all neural networks are open-sourced at https://github.com/abdelfattah-lab/flan_nas.'
volume: 235
URL: https://proceedings.mlr.press/v235/akhauri24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/akhauri24a/akhauri24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-akhauri24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yash
family: Akhauri
- given: Mohamed S
family: Abdelfattah
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 740-759
id: akhauri24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 740
lastpage: 759
published: 2024-07-08 00:00:00 +0000
- title: 'Iterated Denoising Energy Matching for Sampling from Boltzmann Densities'
abstract: 'Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-body systems, is a foundational problem in science. In this paper, we propose Iterated Denoising Energy Matching (iDEM), an iterative algorithm that uses a novel stochastic score matching objective leveraging solely the energy function and its gradient—and no data samples—to train a diffusion-based sampler. Specifically, iDEM alternates between (I) sampling regions of high model density from a diffusion-based sampler and (II) using these samples in our stochastic matching objective to further improve the sampler. iDEM is scalable to high dimensions as the inner matching objective, is *simulation-free*, and requires no MCMC samples. Moreover, by leveraging the fast mode mixing behavior of diffusion, iDEM smooths out the energy landscape enabling efficient exploration and learning of an amortized sampler. We evaluate iDEM on a suite of tasks ranging from standard synthetic energy functions to invariant $n$-body particle systems. We show that the proposed approach achieves state-of-the-art performance on all metrics and trains $2-5\times$ faster, which allows it to be the first method to train using energy on the challenging $55$-particle Lennard-Jones system.'
volume: 235
URL: https://proceedings.mlr.press/v235/akhound-sadegh24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/akhound-sadegh24a/akhound-sadegh24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-akhound-sadegh24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tara
family: Akhound-Sadegh
- given: Jarrid
family: Rector-Brooks
- given: Joey
family: Bose
- given: Sarthak
family: Mittal
- given: Pablo
family: Lemos
- given: Cheng-Hao
family: Liu
- given: Marcin
family: Sendera
- given: Siamak
family: Ravanbakhsh
- given: Gauthier
family: Gidel
- given: Yoshua
family: Bengio
- given: Nikolay
family: Malkin
- given: Alexander
family: Tong
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 760-786
id: akhound-sadegh24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 760
lastpage: 786
published: 2024-07-08 00:00:00 +0000
- title: 'In-Context Language Learning: Architectures and Algorithms'
abstract: 'Some neural language models (LMs) exhibit a remarkable capacity for in-context learning (ICL): they can fit predictors to datasets provided as input. While the mechanisms underlying ICL are well-studied in the context of synthetic problems like in-context linear regression, there is still some divergence between these model problems and the “real” ICL exhibited by LMs trained on large text corpora. In this paper, we study ICL through the lens of a new family of model problems we term in context language learning (ICLL). In ICLL, LMs are presented with a set of strings from a formal language, and must generate additional strings from the same language. We focus on in- context learning of regular languages generated by random finite automata. We evaluate a diverse set of neural sequence models on regular ICLL tasks. We first show that Transformers significantly outperform neural sequence models with recurrent or convolutional representations on ICLL tasks. Next, we provide evidence that they do so by computing in-context n-gram statistics using specialized attention heads. Finally, we show that hard-wiring these heads into neural models improves performance not just on synthetic ICLL, but natural language modeling, reducing the perplexity of 340M-parameter Transformers by up to 1.14 points (6.7%) on the SlimPajama dataset. Our results highlight the usefulness of in-context formal language learning as a tool for understanding ICL in models of natural text.'
volume: 235
URL: https://proceedings.mlr.press/v235/akyurek24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/akyurek24a/akyurek24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-akyurek24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ekin
family: Akyürek
- given: Bailin
family: Wang
- given: Yoon
family: Kim
- given: Jacob
family: Andreas
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 787-812
id: akyurek24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 787
lastpage: 812
published: 2024-07-08 00:00:00 +0000
- title: 'Nonlinear Filtering with Brenier Optimal Transport Maps'
abstract: 'This paper is concerned with the problem of nonlinear filtering, i.e., computing the conditional distribution of the state of a stochastic dynamical system given a history of noisy partial observations. Conventional sequential importance resampling (SIR) particle filters suffer from fundamental limitations, in scenarios involving degenerate likelihoods or high-dimensional states, due to the weight degeneracy issue. In this paper, we explore an alternative method, which is based on estimating the Brenier optimal transport (OT) map from the current prior distribution of the state to the posterior distribution at the next time step. Unlike SIR particle filters, the OT formulation does not require the analytical form of the likelihood. Moreover, it allows us to harness the approximation power of neural networks to model complex and multi-modal distributions and employ stochastic optimization algorithms to enhance scalability. Extensive numerical experiments are presented that compare the OT method to the SIR particle filter and the ensemble Kalman filter, evaluating the performance in terms of sample efficiency, high-dimensional scalability, and the ability to capture complex and multi-modal distributions.'
volume: 235
URL: https://proceedings.mlr.press/v235/al-jarrah24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/al-jarrah24a/al-jarrah24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-al-jarrah24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mohammad
family: Al-Jarrah
- given: Niyizhen
family: Jin
- given: Bamdad
family: Hosseini
- given: Amirhossein
family: Taghvaei
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 813-839
id: al-jarrah24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 813
lastpage: 839
published: 2024-07-08 00:00:00 +0000
- title: 'Revisiting Inexact Fixed-Point Iterations for Min-Max Problems: Stochasticity and Structured Nonconvexity'
abstract: 'We focus on constrained, $L$-smooth, potentially stochastic and nonconvex-nonconcave min-max problems either satisfying $\rho$-cohypomonotonicity or admitting a solution to the $\rho$-weakly Minty Variational Inequality (MVI), where larger values of the parameter $\rho>0$ correspond to a greater degree of nonconvexity. These problem classes include examples in two player reinforcement learning, interaction dominant min-max problems, and certain synthetic test problems on which classical min-max algorithms fail. It has been conjectured that first-order methods can tolerate a value of $\rho$ no larger than $\frac{1}{L}$, but existing results in the literature have stagnated at the tighter requirement $\rho < \frac{1}{2L}$. With a simple argument, we obtain optimal or best-known complexity guarantees with cohypomonotonicity or weak MVI conditions for $\rho < \frac{1}{L}$. First main insight for the improvements in the convergence analyses is to harness the recently proposed *conic nonexpansiveness* property of operators. Second, we provide a refined analysis for inexact Halpern iteration that relaxes the required inexactness level to improve some state-of-the-art complexity results even for constrained stochastic convex-concave min-max problems. Third, we analyze a stochastic inexact Krasnosel’skii-Mann iteration with a multilevel Monte Carlo estimator when the assumptions only hold with respect to a solution.'
volume: 235
URL: https://proceedings.mlr.press/v235/alacaoglu24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/alacaoglu24a/alacaoglu24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-alacaoglu24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ahmet
family: Alacaoglu
- given: Donghwan
family: Kim
- given: Stephen
family: Wright
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 840-878
id: alacaoglu24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 840
lastpage: 878
published: 2024-07-08 00:00:00 +0000
- title: 'Gaussian Processes on Cellular Complexes'
abstract: 'In recent years, there has been considerable interest in developing machine learning models on graphs to account for topological inductive biases. In particular, recent attention has been given to Gaussian processes on such structures since they can additionally account for uncertainty. However, graphs are limited to modelling relations between two vertices. In this paper, we go beyond this dyadic setting and consider polyadic relations that include interactions between vertices, edges and one of their generalisations, known as cells. Specifically, we propose Gaussian processes on cellular complexes, a generalisation of graphs that captures interactions between these higher-order cells. One of our key contributions is the derivation of two novel kernels, one that generalises the graph Matérn kernel and one that additionally mixes information of different cell types.'
volume: 235
URL: https://proceedings.mlr.press/v235/alain24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/alain24a/alain24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-alain24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mathieu
family: Alain
- given: So
family: Takao
- given: Brooks
family: Paige
- given: Marc Peter
family: Deisenroth
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 879-905
id: alain24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 879
lastpage: 905
published: 2024-07-08 00:00:00 +0000
- title: 'Remembering to Be Fair: Non-Markovian Fairness in Sequential Decision Making'
abstract: 'Fair decision making has largely been studied with respect to a single decision. Here we investigate the notion of fairness in the context of sequential decision making where multiple stakeholders can be affected by the outcomes of decisions. We observe that fairness often depends on the history of the sequential decision-making process, and in this sense that it is inherently non-Markovian. We further observe that fairness often needs to be assessed at time points *within* the process, not just at the end of the process. To advance our understanding of this class of fairness problems, we explore the notion of non-Markovian fairness in the context of sequential decision making. We identify properties of non-Markovian fairness, including notions of long-term, anytime, periodic, and bounded fairness. We explore the interplay between non-Markovian fairness and memory and how memory can support construction of fair policies. Finally, we introduce the FairQCM algorithm, which can automatically augment its training data to improve sample efficiency in the synthesis of fair policies via reinforcement learning.'
volume: 235
URL: https://proceedings.mlr.press/v235/alamdari24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/alamdari24a/alamdari24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-alamdari24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Parand A.
family: Alamdari
- given: Toryn Q.
family: Klassen
- given: Elliot
family: Creager
- given: Sheila A.
family: Mcilraith
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 906-920
id: alamdari24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 906
lastpage: 920
published: 2024-07-08 00:00:00 +0000
- title: 'Stochastic Interpolants with Data-Dependent Couplings'
abstract: 'Generative models inspired by dynamical transport of measure – such as flows and diffusions – construct a continuous-time map between two probability densities. Conventionally, one of these is the target density, only accessible through samples, while the other is taken as a simple base density that is data-agnostic. In this work, using the framework of stochastic interpolants, we formalize how to *couple* the base and the target densities, whereby samples from the base are computed conditionally given samples from the target in a way that is different from (but does not preclude) incorporating information about class labels or continuous embeddings. This enables us to construct dynamical transport maps that serve as conditional generative models. We show that these transport maps can be learned by solving a simple square loss regression problem analogous to the standard independent setting. We demonstrate the usefulness of constructing dependent couplings in practice through experiments in super-resolution and in-painting. The code is available at https://github.com/interpolants/couplings.'
volume: 235
URL: https://proceedings.mlr.press/v235/albergo24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/albergo24a/albergo24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-albergo24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Michael Samuel
family: Albergo
- given: Mark
family: Goldstein
- given: Nicholas Matthew
family: Boffi
- given: Rajesh
family: Ranganath
- given: Eric
family: Vanden-Eijnden
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 921-937
id: albergo24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 921
lastpage: 937
published: 2024-07-08 00:00:00 +0000
- title: 'Evaluating Model Bias Requires Characterizing its Mistakes'
abstract: 'The ability to properly benchmark model performance in the face of spurious correlations is important to both build better predictors and increase confidence that models are operating as intended. We demonstrate that characterizing (as opposed to simply quantifying) model mistakes across subgroups is pivotal to properly reflect model biases, which are ignored by standard metrics such as worst-group accuracy or accuracy gap. Inspired by the hypothesis testing framework, we introduce SkewSize, a principled and flexible metric that captures bias from mistakes in a model’s predictions. It can be used in multi-class settings or generalised to the open vocabulary setting of generative models. SkewSize is an aggregation of the effect size of the interaction between two categorical variables: the spurious variable representing the bias attribute the model’s prediction. We demonstrate the utility of SkewSize in multiple settings including: standard vision models trained on synthetic data, vision models trained on ImageNet, and large scale vision-and-language models from the BLIP-2 family. In each case, the proposed SkewSize is able to highlight biases not captured by other metrics, while also providing insights on the impact of recently proposed techniques, such as instruction tuning.'
volume: 235
URL: https://proceedings.mlr.press/v235/albuquerque24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/albuquerque24a/albuquerque24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-albuquerque24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Isabela
family: Albuquerque
- given: Jessica
family: Schrouff
- given: David
family: Warde-Farley
- given: Ali Taylan
family: Cemgil
- given: Sven
family: Gowal
- given: Olivia
family: Wiles
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 938-954
id: albuquerque24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 938
lastpage: 954
published: 2024-07-08 00:00:00 +0000
- title: 'Energy-Efficient Gaussian Processes Using Low-Precision Arithmetic'
abstract: 'The widespread use of artificial intelligence requires finding energy-efficient paradigms for the field. We propose to reduce the energy consumption of Gaussian process regression using low-precision floating-point representations. We explore how low-precision representations impact the results of Gaussian process regression and how data set properties, implementation approach, model performance, and energy consumption interact. Our findings show that a well-conditioned kernel matrix allows reducing the energy consumption by up to 89.01% for 98.08% of arithmetic operations with little to no impact on model performance. Our findings are relevant whenever one needs to invert a symmetric full-rank matrix.'
volume: 235
URL: https://proceedings.mlr.press/v235/alder24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/alder24a/alder24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-alder24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Nicolas
family: Alder
- given: Ralf
family: Herbrich
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 955-975
id: alder24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 955
lastpage: 975
published: 2024-07-08 00:00:00 +0000
- title: 'Evaluation of Test-Time Adaptation Under Computational Time Constraints'
abstract: 'This paper proposes a novel online evaluation protocol for Test Time Adaptation (TTA) methods, which penalizes slower methods by providing them with fewer samples for adaptation. TTA methods leverage unlabeled data at test time to adapt to distribution shifts. Though many effective methods have been proposed, their impressive performance usually comes at the cost of significantly increased computation budgets. Current evaluation protocols overlook the effect of this extra computation cost, affecting their real-world applicability. To address this issue, we propose a more realistic evaluation protocol for TTA methods, where data is received in an online fashion from a constant-speed data stream, thereby accounting for the method’s adaptation speed. We apply our proposed protocol to benchmark several TTA methods on multiple datasets and scenarios. Extensive experiments shows that, when accounting for inference speed, simple and fast approaches can outperform more sophisticated but slower methods. For example, SHOT from 2020, outperforms the state-of-the-art method SAR from 2023 under our online setting. Our results reveal the importance of developing practical TTA methods that are both accurate and efficient.'
volume: 235
URL: https://proceedings.mlr.press/v235/alfarra24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/alfarra24a/alfarra24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-alfarra24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Motasem
family: Alfarra
- given: Hani
family: Itani
- given: Alejandro
family: Pardo
- given: Shyma Yaser
family: Alhuwaider
- given: Merey
family: Ramazanova
- given: Juan Camilo
family: Perez
- given: Zhipeng
family: Cai
- given: Matthias
family: Müller
- given: Bernard
family: Ghanem
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 976-991
id: alfarra24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 976
lastpage: 991
published: 2024-07-08 00:00:00 +0000
- title: 'On the Weight Dynamics of Deep Normalized Networks'
abstract: 'Recent studies have shown that high disparities in effective learning rates (ELRs) across layers in deep neural networks can negatively affect trainability. We formalize how these disparities evolve over time by modeling weight dynamics (evolution of expected gradient and weight norms) of networks with normalization layers, predicting the evolution of layer-wise ELR ratios. We prove that when training with any constant learning rate, ELR ratios converge to 1, despite initial gradient explosion. We identify a "critical learning rate" beyond which ELR disparities widen, which only depends on current ELRs. To validate our findings, we devise a hyper-parameter-free warm-up method that successfully minimizes ELR spread quickly in theory and practice. Our experiments link ELR spread with trainability, a relationship that is most evident in very deep networks with significant gradient magnitude excursions.'
volume: 235
URL: https://proceedings.mlr.press/v235/ali-mehmeti-gopel24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/ali-mehmeti-gopel24a/ali-mehmeti-gopel24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-ali-mehmeti-gopel24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Christian H.X.
family: Ali Mehmeti-Göpel
- given: Michael
family: Wand
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 992-1007
id: ali-mehmeti-gopel24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 992
lastpage: 1007
published: 2024-07-08 00:00:00 +0000
- title: 'No Dimensional Sampling Coresets for Classification'
abstract: 'We refine and generalize what is known about coresets for classification problems via the sensitivity sampling framework. Such coresets seek the smallest possible subsets of input data, so one can optimize a loss function on the coreset and ensure approximation guarantees with respect to the original data. Our analysis provides the first no dimensional coresets, so the size does not depend on the dimension. Moreover, our results are general, apply for distributional input and can use iid samples, so provide sample complexity bounds, and work for a variety of loss functions. A key tool we develop is a Radamacher complexity version of the main sensitivity sampling approach, which can be of independent interest.'
volume: 235
URL: https://proceedings.mlr.press/v235/alishahi24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/alishahi24a/alishahi24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-alishahi24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Meysam
family: Alishahi
- given: Jeff M.
family: Phillips
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1008-1049
id: alishahi24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1008
lastpage: 1049
published: 2024-07-08 00:00:00 +0000
- title: 'Unsupervised Evaluation of Code LLMs with Round-Trip Correctness'
abstract: 'To evaluate code large language models (LLMs), research has relied on a few small manually curated benchmarks, such as HumanEval and MBPP, which represent a narrow part of the real-world software domains. In this work, we introduce round-trip correctness (RTC) as an alternative evaluation method. RTC allows Code LLM evaluation on a broader spectrum of real-world software domains without the need for costly human curation. RTC rests on the idea that we can ask a model to make a prediction (e.g., describe some code using natural language), feed that prediction back (e.g., synthesize code from the predicted description), and check if this round-trip leads to code that is semantically equivalent to the original input. We show how to employ RTC to evaluate code synthesis and editing. We find that RTC strongly correlates with model performance on existing narrow-domain code synthesis benchmarks while allowing us to expand to a much broader set of domains and tasks which was not previously possible without costly human annotations.'
volume: 235
URL: https://proceedings.mlr.press/v235/allamanis24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/allamanis24a/allamanis24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-allamanis24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Miltiadis
family: Allamanis
- given: Sheena
family: Panthaplackel
- given: Pengcheng
family: Yin
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1050-1066
id: allamanis24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1050
lastpage: 1066
published: 2024-07-08 00:00:00 +0000
- title: 'Physics of Language Models: Part 3.1, Knowledge Storage and Extraction'
abstract: 'Large language models (LLMs) can store a vast amount of world knowledge, often extractable via question-answering (e.g., "What is Abraham Lincoln’s birthday?”). However, do they answer such questions based on exposure to similar questions during training (i.e., cheating), or by genuinely learning to extract knowledge from sources like Wikipedia? In this paper, we investigate this issue using a controlled biography dataset. We find a strong correlation between the model’s ability to extract knowledge and various *diversity measures* of the training data. **Essentially**, for knowledge to be reliably extracted, it must be sufficiently augmented (e.g., through paraphrasing, sentence shuffling) *during pretraining*. Without such augmentation, knowledge may be memorized but not extractable, leading to 0% accuracy, regardless of subsequent instruction fine-tuning. To understand why this occurs, we employ (nearly) linear probing to demonstrate a strong connection between the observed correlation and *how the model internally encodes knowledge* — whether it is linearly encoded in the hidden embeddings of entity names or distributed across other token embeddings in the training text. **This paper provides several key recommendations for LLM pretraining in the industry:** (1) rewrite the pretraining data — using small, auxiliary models — to provide knowledge augmentation, and (2) incorporate more instruction-finetuning data into the pretraining stage before it becomes too late.'
volume: 235
URL: https://proceedings.mlr.press/v235/allen-zhu24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/allen-zhu24a/allen-zhu24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-allen-zhu24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zeyuan
family: Allen-Zhu
- given: Yuanzhi
family: Li
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1067-1077
id: allen-zhu24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1067
lastpage: 1077
published: 2024-07-08 00:00:00 +0000
- title: 'Byzantine-Robust Federated Learning: Impact of Client Subsampling and Local Updates'
abstract: 'The possibility of adversarial (a.k.a., Byzantine) clients makes federated learning (FL) prone to arbitrary manipulation. The natural approach to robustify FL against adversarial clients is to replace the simple averaging operation at the server in the standard $\mathsf{FedAvg}$ algorithm by a robust averaging rule. While a significant amount of work has been devoted to studying the convergence of federated robust averaging (which we denote by $\mathsf{FedRo}$), prior work has largely ignored the impact of client subsampling and local steps, two fundamental FL characteristics. While client subsampling increases the effective fraction of Byzantine clients, local steps increase the drift between the local updates computed by honest (i.e., non-Byzantine) clients. Consequently, a careless deployment of $\mathsf{FedRo}$ could yield poor performance. We validate this observation by presenting an in-depth analysis of $\mathsf{FedRo}$ tightly analyzing the impact of client subsampling and local steps. Specifically, we present a sufficient condition on client subsampling for nearly-optimal convergence of $\mathsf{FedRo}$ (for smooth non-convex loss). Also, we show that the rate of improvement in learning accuracy diminishes with respect to the number of clients subsampled, as soon as the sample size exceeds a threshold value. Interestingly, we also observe that under a careful choice of step-sizes, the learning error due to Byzantine clients decreases with the number of local steps. We validate our theory by experiments on the FEMNIST and CIFAR-$10$ image classification tasks.'
volume: 235
URL: https://proceedings.mlr.press/v235/allouah24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/allouah24a/allouah24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-allouah24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Youssef
family: Allouah
- given: Sadegh
family: Farhadkhani
- given: Rachid
family: Guerraoui
- given: Nirupam
family: Gupta
- given: Rafael
family: Pinot
- given: Geovani
family: Rizk
- given: Sasha
family: Voitovych
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1078-1114
id: allouah24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1078
lastpage: 1114
published: 2024-07-08 00:00:00 +0000
- title: 'The Privacy Power of Correlated Noise in Decentralized Learning'
abstract: 'Decentralized learning is appealing as it enables the scalable usage of large amounts of distributed data and resources without resorting to any central entity, while promoting privacy since every user minimizes the direct exposure of their data. Yet, without additional precautions, curious users can still leverage models obtained from their peers to violate privacy. In this paper, we propose Decor, a variant of decentralized SGD with differential privacy (DP) guarantees. Essentially, in Decor, users securely exchange randomness seeds in one communication round to generate pairwise-canceling correlated Gaussian noises, which are injected to protect local models at every communication round. We theoretically and empirically show that, for arbitrary connected graphs, Decor matches the central DP optimal privacy-utility trade-off. We do so under SecLDP, our new relaxation of local DP, which protects all user communications against an external eavesdropper and curious users, assuming that every pair of connected users shares a secret, i.e., an information hidden to all others. The main theoretical challenge is to control the accumulation of non-canceling correlated noise due to network sparsity. We also propose a companion SecLDP privacy accountant for public use.'
volume: 235
URL: https://proceedings.mlr.press/v235/allouah24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/allouah24b/allouah24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-allouah24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Youssef
family: Allouah
- given: Anastasia
family: Koloskova
- given: Aymane El
family: Firdoussi
- given: Martin
family: Jaggi
- given: Rachid
family: Guerraoui
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1115-1143
id: allouah24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1115
lastpage: 1143
published: 2024-07-08 00:00:00 +0000
- title: 'Predicting Dose-Response Curves with Deep Neural Networks'
abstract: 'Dose-response curves characterize the relationship between the concentration of drugs and their inhibitory effect on the growth of specific types of cells. The predominant Hill-equation model of an ideal enzymatic inhibition unduly simplifies the biochemical reality of many drugs; and for these drugs the widely-used drug performance indicator of the half-inhibitory concentration $IC_{50}$ can lead to poor therapeutic recommendations and poor selections of promising drug candidates. We develop a neural model that uses an embedding of the interaction between drug molecules and the tissue transcriptome to estimate the entire dose-response curve rather than a scalar aggregate. We find that, compared to the prior state of the art, this model excels at interpolating and extrapolating the inhibitory effect of untried concentrations. Unlike prevalent parametric models, it it able to accurately predict dose-response curves of drugs on previously unseen tumor tissues as well as of previously untested drug molecules on established tumor cell lines.'
volume: 235
URL: https://proceedings.mlr.press/v235/alonso-campana24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/alonso-campana24a/alonso-campana24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-alonso-campana24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Pedro
family: Alonso Campana
- given: Paul
family: Prasse
- given: Tobias
family: Scheffer
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1144-1154
id: alonso-campana24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1144
lastpage: 1154
published: 2024-07-08 00:00:00 +0000
- title: 'Robust and Conjugate Gaussian Process Regression'
abstract: 'To enable closed form conditioning, a common assumption in Gaussian process (GP) regression is independent and identically distributed Gaussian observation noise. This strong and simplistic assumption is often violated in practice, which leads to unreliable inferences and uncertainty quantification. Unfortunately, existing methods for robustifying GPs break closed-form conditioning, which makes them less attractive to practitioners and significantly more computationally expensive. In this paper, we demonstrate how to perform provably robust and conjugate Gaussian process (RCGP) regression at virtually no additional cost using generalised Bayesian inference. RCGP is particularly versatile as it enables exact conjugate closed form updates in all settings where standard GPs admit them. To demonstrate its strong empirical performance, we deploy RCGP for problems ranging from Bayesian optimisation to sparse variational Gaussian processes.'
volume: 235
URL: https://proceedings.mlr.press/v235/altamirano24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/altamirano24a/altamirano24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-altamirano24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Matias
family: Altamirano
- given: Francois-Xavier
family: Briol
- given: Jeremias
family: Knoblauch
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1155-1185
id: altamirano24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1155
lastpage: 1185
published: 2024-07-08 00:00:00 +0000
- title: 'Beyond the Norms: Detecting Prediction Errors in Regression Models'
abstract: 'This paper tackles the challenge of detecting unreliable behavior in regression algorithms, which may arise from intrinsic variability (e.g., aleatoric uncertainty) or modeling errors (e.g., model uncertainty). First, we formally introduce the notion of unreliability in regression, i.e., when the output of the regressor exceeds a specified discrepancy (or error). Then, using powerful tools for probabilistic modeling, we estimate the discrepancy density, and we measure its statistical diversity using our proposed metric for statistical dissimilarity. In turn, this allows us to derive a data-driven score that expresses the uncertainty of the regression outcome. We show empirical improvements in error detection for multiple regression tasks, consistently outperforming popular baseline approaches, and contributing to the broader field of uncertainty quantification and safe machine learning systems.'
volume: 235
URL: https://proceedings.mlr.press/v235/altieri24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/altieri24a/altieri24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-altieri24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Andres
family: Altieri
- given: Marco
family: Romanelli
- given: Georg
family: Pichler
- given: Florence
family: Alberge
- given: Pablo
family: Piantanida
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1186-1221
id: altieri24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1186
lastpage: 1221
published: 2024-07-08 00:00:00 +0000
- title: 'Position: Stop Making Unscientific AGI Performance Claims'
abstract: 'Developments in the field of Artificial Intelligence (AI), and particularly large language models (LLMs), have created a ’perfect storm’ for observing ’sparks’ of Artificial General Intelligence (AGI) that are spurious. Like simpler models, LLMs distill meaningful representations in their latent embeddings that have been shown to correlate with external variables. Nonetheless, the correlation of such representations has often been linked to human-like intelligence in the latter but not the former. We probe models of varying complexity including random projections, matrix decompositions, deep autoencoders and transformers: all of them successfully distill information that can be used to predict latent or external variables and yet none of them have previously been linked to AGI. We argue and empirically demonstrate that the finding of meaningful patterns in latent spaces of models cannot be seen as evidence in favor of AGI. Additionally, we review literature from the social sciences that shows that humans are prone to seek such patterns and anthropomorphize. We conclude that both the methodological setup and common public image of AI are ideal for the misinterpretation that correlations between model representations and some variables of interest are ’caused’ by the model’s understanding of underlying ’ground truth’ relationships. We, therefore, call for the academic community to exercise extra caution, and to be keenly aware of principles of academic integrity, in interpreting and communicating about AI research outcomes.'
volume: 235
URL: https://proceedings.mlr.press/v235/altmeyer24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/altmeyer24a/altmeyer24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-altmeyer24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Patrick
family: Altmeyer
- given: Andrew M.
family: Demetriou
- given: Antony
family: Bartlett
- given: Cynthia C. S.
family: Liem
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1222-1242
id: altmeyer24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1222
lastpage: 1242
published: 2024-07-08 00:00:00 +0000
- title: 'Hyperbolic Optimizer as a Dynamical System'
abstract: 'During the last few years, the field of dynamical systems has been developing innovative tools to study the asymptotic behavior of different optimizers in the context of neural networks. In this work, we redefine an extensively studied optimizer, employing classical techniques from hyperbolic geometry. This new definition is linked to a non-linear differential equation as a continuous limit. Additionally, by utilizing Lyapunov stability concepts, we analyze the asymptotic behavior of its critical points.'
volume: 235
URL: https://proceedings.mlr.press/v235/alvarado24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/alvarado24a/alvarado24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-alvarado24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Nico
family: Alvarado
- given: Hans
family: Lobel
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1243-1260
id: alvarado24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1243
lastpage: 1260
published: 2024-07-08 00:00:00 +0000
- title: 'Stationarity without mean reversion in improper Gaussian processes'
abstract: 'The behavior of a GP regression depends on the choice of covariance function. Stationary covariance functions are preferred in machine learning applications. However, (non-periodic) stationary covariance functions are always mean reverting and can therefore exhibit pathological behavior when applied to data that does not relax to a fixed global mean value. In this paper we show that it is possible to use improper GP priors with infinite variance to define processes that are stationary but not mean reverting. To this aim, we use of non-positive kernels that can only be defined in this limit regime. The resulting posterior distributions can be computed analytically and it involves a simple correction of the usual formulas. The main contribution of the paper is the introduction of a large family of smooth non-reverting covariance functions that closely resemble the kernels commonly used in the GP literature (e.g. squared exponential and Matérn class). By analyzing both synthetic and real data, we demonstrate that these non-positive kernels solve some known pathologies of mean reverting GP regression while retaining most of the favorable properties of ordinary smooth stationary kernels.'
volume: 235
URL: https://proceedings.mlr.press/v235/ambrogioni24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/ambrogioni24a/ambrogioni24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-ambrogioni24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Luca
family: Ambrogioni
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1261-1275
id: ambrogioni24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1261
lastpage: 1275
published: 2024-07-08 00:00:00 +0000
- title: 'Robust Graph Matching when Nodes are Corrupt'
abstract: 'Two models are introduced to study the problem of matching two correlated graphs when some of the nodes are corrupt. In the weak model, a random subset of nodes in one or both graphs can interact randomly with their network. For this model, it is shown that no estimator can correctly recover a positive fraction of the corrupt nodes. Necessary conditions for any estimator to correctly identify and match all the uncorrupt nodes are derived, and it is shown that these conditions are also sufficient for the k-core estimator. In the strong model, an adversarially selected subset of nodes in one or both graphs can interact arbitrarily with their network. For this model, detection of corrupt nodes is impossible. Even so, we show that if only one of the networks is compromised, then under appropriate conditions, the maximum overlap estimator can correctly match a positive fraction of nodes albeit without explicitly identifying them.'
volume: 235
URL: https://proceedings.mlr.press/v235/ameen24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/ameen24a/ameen24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-ameen24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Taha
family: Ameen
- given: Bruce
family: Hajek
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1276-1305
id: ameen24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1276
lastpage: 1305
published: 2024-07-08 00:00:00 +0000
- title: 'Fast Algorithms for Hypergraph PageRank with Applications to Semi-Supervised Learning'
abstract: 'A fundamental approach to semi-supervised learning is to leverage the structure of the sample space to diffuse label information from annotated examples to unlabeled points. Traditional methods model the input data points as a graph and rely on fast algorithms for solving Laplacian systems of equations, such as those defining PageRank. However, previous work has demonstrated that graph-based models fail to capture higher-order relations, such as group membership, which are better modeled by hypergraphs. Unfortunately, the scalable application of hypergraph models has been hampered by the non-linearity of the hypergraph Laplacian. In this paper, we present highly scalable algorithms for hypergraph primitives, such as hypergraph PageRank vectors and hypergraph Laplacian systems, over general families of hypergraphs. In addition to giving strong theoretical guarantees, we empirically showcase the speed of our algorithms on benchmark instances of semi-supervised learning on categorical data. We exploit their generality to improve semi-supervised manifold clustering via hypergraph models. By providing significant speed-ups on fundamental hypergraph tasks, our algorithms enable the deployment of hypergraph models on a massive scale.'
volume: 235
URL: https://proceedings.mlr.press/v235/ameranis24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/ameranis24a/ameranis24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-ameranis24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Konstantinos
family: Ameranis
- given: Adela Frances
family: Depavia
- given: Lorenzo
family: Orecchia
- given: Erasmo
family: Tani
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1306-1330
id: ameranis24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1306
lastpage: 1330
published: 2024-07-08 00:00:00 +0000
- title: 'Scalable and Flexible Causal Discovery with an Efficient Test for Adjacency'
abstract: 'To make accurate predictions, understand mechanisms, and design interventions in systems of many variables, we wish to learn causal graphs from large scale data. Unfortunately the space of all possible causal graphs is enormous so scalably and accurately searching for the best fit to the data is a challenge. In principle we could substantially decrease the search space, or learn the graph entirely, by testing the conditional independence of variables. However, deciding if two variables are adjacent in a causal graph may require an exponential number of tests. Here we build a scalable and flexible method to evaluate if two variables are adjacent in a causal graph, the Differentiable Adjacency Test (DAT). DAT replaces an exponential number of tests with a provably equivalent relaxed problem. It then solves this problem by training two neural networks. We build a graph learning method based on DAT, DAT-Graph, that can also learn from data with interventions. DAT-Graph can learn graphs of 1000 variables with state of the art accuracy. Using the graph learned by DAT-Graph, we also build models that make much more accurate predictions of the effects of interventions on large scale RNA sequencing data.'
volume: 235
URL: https://proceedings.mlr.press/v235/amin24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/amin24a/amin24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-amin24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alan Nawzad
family: Amin
- given: Andrew Gordon
family: Wilson
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1331-1358
id: amin24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1331
lastpage: 1358
published: 2024-07-08 00:00:00 +0000
- title: 'Generalization Error of Graph Neural Networks in the Mean-field Regime'
abstract: 'This work provides a theoretical framework for assessing the generalization error of graph neural networks in the over-parameterized regime, where the number of parameters surpasses the quantity of data points. We explore two widely utilized types of graph neural networks: graph convolutional neural networks and message passing graph neural networks. Prior to this study, existing bounds on the generalization error in the over-parametrized regime were uninformative, limiting our understanding of over-parameterized network performance. Our novel approach involves deriving upper bounds within the mean-field regime for evaluating the generalization error of these graph neural networks. We establish upper bounds with a convergence rate of $O(1/n)$, where $n$ is the number of graph samples. These upper bounds offer a theoretical assurance of the networks’ performance on unseen data in the challenging over-parameterized regime and overall contribute to our understanding of their performance.'
volume: 235
URL: https://proceedings.mlr.press/v235/aminian24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/aminian24a/aminian24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-aminian24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Gholamali
family: Aminian
- given: Yixuan
family: He
- given: Gesine
family: Reinert
- given: Lukasz
family: Szpruch
- given: Samuel N.
family: Cohen
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1359-1391
id: aminian24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1359
lastpage: 1391
published: 2024-07-08 00:00:00 +0000
- title: 'Scalable Online Exploration via Coverability'
abstract: 'Exploration is a major challenge in reinforcement learning, especially for high-dimensional domains that require function approximation. We propose exploration objectives—policy optimization objectives that enable downstream maximization of any reward function—as a conceptual framework to systematize the study of exploration. We introduce a new objective, L1-Coverage, which generalizes previous exploration schemes and supports three fundamental desiderata: 1. *Intrinsic complexity control.* L1-Coverage is associated with a structural parameter, L1-Coverability, which reflects the intrinsic statistical difficulty of the underlying MDP, subsuming Block and Low-Rank MDPs. 2. *Efficient planning.* For a known MDP, L1-Coverage efficiently reduces to standard policy optimization, allowing flexible integration with off-the-shelf methods such as policy gradient and Q-learning approaches. 3. *Efficient exploration.* L1-Coverage enables the first computationally efficient model-based and model-free algorithms for online (reward-free or reward-driven) reinforcement learning in MDPs with low coverability. Empirically, we find that L1-Coverage effectively drives off-the-shelf policy optimization algorithms to explore the state space.'
volume: 235
URL: https://proceedings.mlr.press/v235/amortila24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/amortila24a/amortila24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-amortila24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Philip
family: Amortila
- given: Dylan J
family: Foster
- given: Akshay
family: Krishnamurthy
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1392-1455
id: amortila24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1392
lastpage: 1455
published: 2024-07-08 00:00:00 +0000
- title: 'WAVES: Benchmarking the Robustness of Image Watermarks'
abstract: 'In the burgeoning age of generative AI, watermarks act as identifiers of provenance and artificial content. We present WAVES (Watermark Analysis via Enhanced Stress-testing), a benchmark for assessing image watermark robustness, overcoming the limitations of current evaluation methods. WAVES integrates detection and identification tasks and establishes a standardized evaluation protocol comprised of a diverse range of stress tests. The attacks in WAVES range from traditional image distortions to advanced, novel variations of diffusive, and adversarial attacks. Our evaluation examines two pivotal dimensions: the degree of image quality degradation and the efficacy of watermark detection after attacks. Our novel, comprehensive evaluation reveals previously undetected vulnerabilities of several modern watermarking algorithms. We envision WAVES as a toolkit for the future development of robust watermarks.'
volume: 235
URL: https://proceedings.mlr.press/v235/an24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/an24a/an24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-an24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Bang
family: An
- given: Mucong
family: Ding
- given: Tahseen
family: Rabbani
- given: Aakriti
family: Agrawal
- given: Yuancheng
family: Xu
- given: Chenghao
family: Deng
- given: Sicheng
family: Zhu
- given: Abdirisak
family: Mohamed
- given: Yuxin
family: Wen
- given: Tom
family: Goldstein
- given: Furong
family: Huang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1456-1492
id: an24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1456
lastpage: 1492
published: 2024-07-08 00:00:00 +0000
- title: 'Training-Free Long-Context Scaling of Large Language Models'
abstract: 'The ability of Large Language Models (LLMs) to process and generate coherent text is markedly weakened when the number of input tokens exceeds their pretraining length. Given the expensive overhead of finetuning large-scale models with longer sequences, we propose a training-free approach named Dual Chunk Attention (DCA), which enables Llama2 70B to support context windows of up to 100k tokens. By decomposing the attention computation for long sequences into chunk-based modules, DCA manages to effectively capture the relative positional information of tokens within the same chunk (Intra-Chunk) and across distinct chunks (Inter-Chunk), as well as integrates seamlessly with Flash Attention. In addition to its impressive extrapolation capability, DCA achieves performance on practical long-context tasks that is comparable to or even better than that of models built through continual training. All code and data used in this work are released at https://github.com/HKUNLP/ChunkLlama.'
volume: 235
URL: https://proceedings.mlr.press/v235/an24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/an24b/an24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-an24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Chenxin
family: An
- given: Fei
family: Huang
- given: Jun
family: Zhang
- given: Shansan
family: Gong
- given: Xipeng
family: Qiu
- given: Chang
family: Zhou
- given: Lingpeng
family: Kong
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1493-1510
id: an24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1493
lastpage: 1510
published: 2024-07-08 00:00:00 +0000
- title: 'Navigating Scaling Laws: Compute Optimality in Adaptive Model Training'
abstract: 'In recent years, the state-of-the-art in deep learning has been dominated by very large models that have been pre-trained on vast amounts of data. The paradigm is very simple: investing more computational resources (optimally) leads to better performance, and even predictably so; neural scaling laws have been derived that accurately forecast the performance of a network for a desired level of compute. This leads to the notion of a ’compute-optimal’ model, i.e. a model that allocates a given level of compute during training optimally to maximize performance. In this work, we extend the concept of optimality by allowing for an ’adaptive’ model, i.e. a model that can change its shape during training. By doing so, we can design adaptive models that optimally traverse between the underlying scaling laws and outpace their ‘static’ counterparts, leading to a significant reduction in the required compute to reach a given target performance. We show that our approach generalizes across modalities and different shape parameters.'
volume: 235
URL: https://proceedings.mlr.press/v235/anagnostidis24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/anagnostidis24a/anagnostidis24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-anagnostidis24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sotiris
family: Anagnostidis
- given: Gregor
family: Bachmann
- given: Imanol
family: Schlag
- given: Thomas
family: Hofmann
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1511-1530
id: anagnostidis24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1511
lastpage: 1530
published: 2024-07-08 00:00:00 +0000
- title: 'Adaptive Hierarchical Certification for Segmentation using Randomized Smoothing'
abstract: 'Certification for machine learning is proving that no adversarial sample can evade a model within a range under certain conditions, a necessity for safety-critical domains. Common certification methods for segmentation use a flat set of fine-grained classes, leading to high abstain rates due to model uncertainty across many classes. We propose a novel, more practical setting, which certifies pixels within a multi-level hierarchy, and adaptively relaxes the certification to a coarser level for unstable components classic methods would abstain from, effectively lowering the abstain rate whilst providing more certified semantically meaningful information. We mathematically formulate the problem setup, introduce an adaptive hierarchical certification algorithm and prove the correctness of its guarantees. Since certified accuracy does not take the loss of information into account for coarser classes, we introduce the Certified Information Gain ($\mathrm{CIG}$) metric, which is proportional to the class granularity level. Our extensive experiments on the datasets Cityscapes, PASCAL-Context, ACDC and COCO-Stuff demonstrate that our adaptive algorithm achieves a higher $\mathrm{CIG}$ and lower abstain rate compared to the current state-of-the-art certification method. Our code can be found here: https://github.com/AlaaAnani/adaptive-certify.'
volume: 235
URL: https://proceedings.mlr.press/v235/anani24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/anani24a/anani24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-anani24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alaa
family: Anani
- given: Tobias
family: Lorenz
- given: Bernt
family: Schiele
- given: Mario
family: Fritz
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1531-1556
id: anani24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1531
lastpage: 1556
published: 2024-07-08 00:00:00 +0000
- title: 'Adaptive Observation Cost Control for Variational Quantum Eigensolvers'
abstract: 'The objective to be minimized in the variational quantum eigensolver (VQE) has a restricted form, which allows a specialized sequential minimal optimization (SMO) that requires only a few observations in each iteration. However, the SMO iteration is still costly due to the observation noise—one *observation* at a point typically requires averaging over hundreds to thousands of repeated quantum *measurement shots* for achieving a reasonable noise level. In this paper, we propose an adaptive cost control method, named *subspace in confident region* (SubsCoRe), for SMO. SubsCoRe uses the Gaussian process (GP) surrogate, and requires it to have low uncertainty over the subspace being updated, so that optimization in each iteration is performed with guaranteed accuracy. Adaptive cost control is performed by setting the required accuracy according to the progress of the optimization, and identifying the minimum number of measurement shots, as well as their distribution, satisfying the SubsCoRe requirement.'
volume: 235
URL: https://proceedings.mlr.press/v235/anders24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/anders24a/anders24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-anders24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Christopher J.
family: Anders
- given: Kim Andrea
family: Nicoli
- given: Bingting
family: Wu
- given: Naima
family: Elosegui
- given: Samuele
family: Pedrielli
- given: Lena
family: Funcke
- given: Karl
family: Jansen
- given: Stefan
family: Kühn
- given: Shinichi
family: Nakajima
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1557-1578
id: anders24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1557
lastpage: 1578
published: 2024-07-08 00:00:00 +0000
- title: 'Fast, Scalable, Warm-Start Semidefinite Programming with Spectral Bundling and Sketching'
abstract: 'While semidefinite programming (SDP) has traditionally been limited to moderate-sized problems, recent algorithms augmented with matrix sketching techniques have enabled solving larger SDPs. However, these methods achieve scalability at the cost of an increase in the number of necessary iterations, resulting in slower convergence as the problem size grows. Furthermore, they require iteration-dependent parameter schedules that prohibit effective utilization of warm-start initializations important in practical applications with incrementally-arriving data or mixed-integer programming. We present Unified Spectral Bundling with Sketching (USBS), a provably correct, fast and scalable algorithm for solving massive SDPs that can leverage a warm-start initialization to further accelerate convergence. Our proposed algorithm is a spectral bundle method for solving general SDPs containing both equality and inequality constraints. Moveover, when augmented with an optional matrix sketching technique, our algorithm achieves the dramatically improved scalability of previous work while sustaining convergence speed. We empirically demonstrate the effectiveness of our method across multiple applications, with and without warm-starting. For example, USBS provides a 500x speed-up over the state-of-the-art scalable SDP solver on an instance with over 2 billion decision variables. We make our implementation in pure JAX publicly available.'
volume: 235
URL: https://proceedings.mlr.press/v235/angell24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/angell24a/angell24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-angell24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Rico
family: Angell
- given: Andrew
family: Mccallum
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1579-1615
id: angell24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1579
lastpage: 1615
published: 2024-07-08 00:00:00 +0000
- title: 'Online conformal prediction with decaying step sizes'
abstract: 'We introduce a method for online conformal prediction with decaying step sizes. Like previous methods, ours possesses a retrospective guarantee of coverage for arbitrary sequences. However, unlike previous methods, we can simultaneously estimate a population quantile when it exists. Our theory and experiments indicate substantially improved practical properties: in particular, when the distribution is stable, the coverage is close to the desired level *for every time point*, not just on average over the observed sequence.'
volume: 235
URL: https://proceedings.mlr.press/v235/angelopoulos24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/angelopoulos24a/angelopoulos24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-angelopoulos24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Anastasios Nikolas
family: Angelopoulos
- given: Rina
family: Barber
- given: Stephen
family: Bates
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1616-1630
id: angelopoulos24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1616
lastpage: 1630
published: 2024-07-08 00:00:00 +0000
- title: 'A Rate-Distortion View of Uncertainty Quantification'
abstract: 'In supervised learning, understanding an input’s proximity to the training data can help a model decide whether it has sufficient evidence for reaching a reliable prediction. While powerful probabilistic models such as Gaussian Processes naturally have this property, deep neural networks often lack it. In this paper, we introduce Distance Aware Bottleneck (DAB), i.e., a new method for enriching deep neural networks with this property. Building on prior information bottleneck approaches, our method learns a codebook that stores a compressed representation of all inputs seen during training. The distance of a new example from this codebook can serve as an uncertainty estimate for the example. The resulting model is simple to train and provides deterministic uncertainty estimates by a single forward pass. Finally, our method achieves better out-of-distribution (OOD) detection and misclassification prediction than prior methods, including expensive ensemble methods, deep kernel Gaussian Processes, and approaches based on the standard information bottleneck.'
volume: 235
URL: https://proceedings.mlr.press/v235/apostolopoulou24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/apostolopoulou24a/apostolopoulou24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-apostolopoulou24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ifigeneia
family: Apostolopoulou
- given: Benjamin
family: Eysenbach
- given: Frank
family: Nielsen
- given: Artur
family: Dubrawski
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1631-1654
id: apostolopoulou24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1631
lastpage: 1654
published: 2024-07-08 00:00:00 +0000
- title: 'Practical Performance Guarantees for Pipelined DNN Inference'
abstract: 'We optimize pipeline parallelism for deep neural network (DNN) inference by partitioning model graphs into $k$ stages and minimizing the running time of the bottleneck stage, including communication. We give practical and effective algorithms for this NP-hard problem, but our emphasis is on tackling the practitioner’s dilemma of deciding when a solution is good enough. To this end, we design novel mixed integer programming (MIP) relaxations for proving lower bounds. Applying these methods to a diverse testbed of 369 production models, for $k \in \\{2, 4, 8, 16, 32, 64\\}$, we empirically show that these lower bounds are strong enough to be useful in practice. Our lower bounds are substantially stronger than standard combinatorial bounds. For example, evaluated via geometric means across a production testbed with $k = 16$ pipeline stages, our MIP formulations raise the lower bound from 0.4598 to 0.9452, expressed as a fraction of the best partition found. In other words, our improved lower bounds close the optimality gap by a factor of 9.855x.'
volume: 235
URL: https://proceedings.mlr.press/v235/archer24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/archer24a/archer24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-archer24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Aaron
family: Archer
- given: Matthew
family: Fahrbach
- given: Kuikui
family: Liu
- given: Prakash
family: Prabhu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1655-1671
id: archer24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1655
lastpage: 1671
published: 2024-07-08 00:00:00 +0000
- title: 'Unsupervised Concept Discovery Mitigates Spurious Correlations'
abstract: 'Models prone to spurious correlations in training data often produce brittle predictions and introduce unintended biases. Addressing this challenge typically involves methods relying on prior knowledge and group annotation to remove spurious correlations, which may not be readily available in many applications. In this paper, we establish a novel connection between unsupervised object-centric learning and mitigation of spurious correlations. Instead of directly inferring subgroups with varying correlations with labels, our approach focuses on discovering concepts: discrete ideas that are shared across input samples. Leveraging existing object-centric representation learning, we introduce CoBalT: a concept balancing technique that effectively mitigates spurious correlations without requiring human labeling of subgroups. Evaluation across the benchmark datasets for sub-population shifts demonstrate superior or competitive performance compared state-of-the-art baselines, without the need for group annotation. Code is available at https://github.com/rarefin/CoBalT'
volume: 235
URL: https://proceedings.mlr.press/v235/arefin24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/arefin24a/arefin24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-arefin24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Md Rifat
family: Arefin
- given: Yan
family: Zhang
- given: Aristide
family: Baratin
- given: Francesco
family: Locatello
- given: Irina
family: Rish
- given: Dianbo
family: Liu
- given: Kenji
family: Kawaguchi
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1672-1688
id: arefin24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1672
lastpage: 1688
published: 2024-07-08 00:00:00 +0000
- title: 'Accelerating Legacy Numerical Solvers by Non-intrusive Gradient-based Meta-solving'
abstract: 'Scientific computing is an essential tool for scientific discovery and engineering design, and its computational cost is always a main concern in practice. To accelerate scientific computing, it is a promising approach to use machine learning (especially meta-learning) techniques for selecting hyperparameters of traditional numerical methods. There have been numerous proposals to this direction, but many of them require automatic-differentiable numerical methods. However, in reality, many practical applications still depend on well-established but non-automatic-differentiable legacy codes, which prevents practitioners from applying the state-of-the-art research to their own problems. To resolve this problem, we propose a non-intrusive methodology with a novel gradient estimation technique to combine machine learning and legacy numerical codes without any modification. We theoretically and numerically show the advantage of the proposed method over other baselines and present applications of accelerating established non-automatic-differentiable numerical solvers implemented in PETSc, a widely used open-source numerical software library.'
volume: 235
URL: https://proceedings.mlr.press/v235/arisaka24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/arisaka24a/arisaka24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-arisaka24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sohei
family: Arisaka
- given: Qianxiao
family: Li
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1689-1708
id: arisaka24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1689
lastpage: 1708
published: 2024-07-08 00:00:00 +0000
- title: 'Causal Action Influence Aware Counterfactual Data Augmentation'
abstract: 'Offline data are both valuable and practical resources for teaching robots complex behaviors. Ideally, learning agents should not be constrained by the scarcity of available demonstrations, but rather generalize beyond the training distribution. However, the complexity of real-world scenarios typically requires huge amounts of data to prevent neural network policies from picking up on spurious correlations and learning non-causal relationships. We propose CAIAC, a data augmentation method that can create feasible synthetic transitions from a fixed dataset without having access to online environment interactions. By utilizing principled methods for quantifying causal influence, we are able to perform counterfactual reasoning by swapping $\textit{action}$-unaffected parts of the state-space between independent trajectories in the dataset. We empirically show that this leads to a substantial increase in robustness of offline learning algorithms against distributional shift.'
volume: 235
URL: https://proceedings.mlr.press/v235/armengol-urpi-24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/armengol-urpi-24a/armengol-urpi-24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-armengol-urpi-24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Núria
family: Armengol Urpı́
- given: Marco
family: Bagatella
- given: Marin
family: Vlastelica
- given: Georg
family: Martius
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1709-1729
id: armengol-urpi-24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1709
lastpage: 1729
published: 2024-07-08 00:00:00 +0000
- title: 'Online Learning and Information Exponents: The Importance of Batch size & Time/Complexity Tradeoffs'
abstract: 'We study the impact of the batch size $n_b$ on the iteration time $T$ of training two-layer neural networks with one-pass stochastic gradient descent (SGD) on multi-index target functions of isotropic covariates. We characterize the optimal batch size minimizing the iteration time as a function of the hardness of the target, as characterized by the information exponents. We show that performing gradient updates with large batches $n_b \lesssim d^{\frac{\ell}{2}}$ minimizes the training time without changing the total sample complexity, where $\ell$ is the information exponent of the target to be learned and $d$ is the input dimension. However, larger batch sizes than $n_b \gg d^{\frac{\ell}{2}}$ are detrimental for improving the time complexity of SGD. We provably overcome this fundamental limitation via a different training protocol, *Correlation loss SGD*, which suppresses the auto-correlation terms in the loss function. We show that one can track the training progress by a system of low-dimensional ordinary differential equations (ODEs). Finally, we validate our theoretical results with numerical experiments.'
volume: 235
URL: https://proceedings.mlr.press/v235/arnaboldi24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/arnaboldi24a/arnaboldi24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-arnaboldi24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Luca
family: Arnaboldi
- given: Yatin
family: Dandi
- given: Florent
family: Krzakala
- given: Bruno
family: Loureiro
- given: Luca
family: Pesce
- given: Ludovic
family: Stephan
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1730-1762
id: arnaboldi24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1730
lastpage: 1762
published: 2024-07-08 00:00:00 +0000
- title: 'Simple linear attention language models balance the recall-throughput tradeoff'
abstract: 'Recent work has shown that attention-based language models excel at "recall", the ability to ground generations in tokens previously seen in context. However, the efficiency of attention-based models is bottle-necked during inference by the KV-cache’s aggressive memory consumption. In this work, we explore whether we can improve language model efficiency (e.g. by reducing memory consumption) without compromising on recall. By applying experiments and theory to a broad set of architectures, we identify a key tradeoff between a model’s recurrent state size and recall ability. We show that efficient alternatives to attention (e.g. H3, Mamba, RWKV) maintain a fixed-size recurrent state, but struggle at recall. We propose BASED a simple architecture combining linear and sliding window attention. By varying BASED window size and linear attention feature dimension, we can dial the state size and traverse the Pareto frontier of the recall-memory tradeoff curve, recovering the full quality of attention on one end and the small state size of attention-alternatives on the other. We train language models up to $1.3$b parameters and show that BASED matches the strongest sub-quadratic models (e.g. Mamba) in perplexity and outperforms them on real-world recall-intensive tasks by 10.36 accuracy points. We further develop IO-aware algorithms that enable BASED to provide 24× higher throughput on language generation than FlashAttention-2, when generating 1024 tokens using 1.3b parameter models. Overall, BASED expands the Pareto frontier of the throughput-recall tradeoff space beyond prior architectures.'
volume: 235
URL: https://proceedings.mlr.press/v235/arora24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/arora24a/arora24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-arora24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Simran
family: Arora
- given: Sabri
family: Eyuboglu
- given: Michael
family: Zhang
- given: Aman
family: Timalsina
- given: Silas
family: Alberti
- given: James
family: Zou
- given: Atri
family: Rudra
- given: Christopher
family: Re
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1763-1840
id: arora24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1763
lastpage: 1840
published: 2024-07-08 00:00:00 +0000
- title: 'Inferring Change Points in High-Dimensional Linear Regression via Approximate Message Passing'
abstract: 'We consider the problem of localizing change points in high-dimensional linear regression. We propose an Approximate Message Passing (AMP) algorithm for estimating both the signals and the change point locations. Assuming Gaussian covariates, we give an exact asymptotic characterization of its estimation performance in the limit where the number of samples grows proportionally to the signal dimension. Our algorithm can be tailored to exploit any prior information on the signal, noise, and change points. It also enables uncertainty quantification in the form of an efficiently computable approximate posterior distribution, whose asymptotic form we characterize exactly. We validate our theory via numerical experiments, and demonstrate the favorable performance of our estimators on both synthetic data and images.'
volume: 235
URL: https://proceedings.mlr.press/v235/arpino24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/arpino24a/arpino24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-arpino24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Gabriel
family: Arpino
- given: Xiaoqi
family: Liu
- given: Ramji
family: Venkataramanan
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1841-1864
id: arpino24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1841
lastpage: 1864
published: 2024-07-08 00:00:00 +0000
- title: 'An amortized approach to non-linear mixed-effects modeling based on neural posterior estimation'
abstract: 'Non-linear mixed-effects models are a powerful tool for studying heterogeneous populations in various fields, including biology, medicine, economics, and engineering. Here, the aim is to find a distribution over the parameters that describe the whole population using a model that can generate simulations for an individual of that population. However, fitting these distributions to data is computationally challenging if the description of individuals is complex and the population is large. To address this issue, we propose a novel machine learning-based approach: We exploit neural density estimation based on conditional normalizing flows to approximate individual-specific posterior distributions in an amortized fashion, thereby allowing for efficient inference of population parameters. Applying this approach to problems from cell biology and pharmacology, we demonstrate its unseen flexibility and scalability to large data sets compared to established methods.'
volume: 235
URL: https://proceedings.mlr.press/v235/arruda24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/arruda24a/arruda24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-arruda24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jonas
family: Arruda
- given: Yannik
family: Schälte
- given: Clemens
family: Peiter
- given: Olga
family: Teplytska
- given: Ulrich
family: Jaehde
- given: Jan
family: Hasenauer
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1865-1901
id: arruda24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1865
lastpage: 1901
published: 2024-07-08 00:00:00 +0000
- title: 'Learning the Target Network in Function Space'
abstract: 'We focus on the task of learning the value function in the reinforcement learning (RL) setting. This task is often solved by updating a pair of online and target networks while ensuring that the parameters of these two networks are equivalent. We propose Lookahead-Replicate (LR), a new value-function approximation algorithm that is agnostic to this parameter-space equivalence. Instead, the LR algorithm is designed to maintain an equivalence between the two networks in the function space. This value-based equivalence is obtained by employing a new target-network update. We show that LR leads to a convergent behavior in learning the value function. We also present empirical results demonstrating that LR-based target-network updates significantly improve deep RL on the Atari benchmark.'
volume: 235
URL: https://proceedings.mlr.press/v235/asadi24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/asadi24a/asadi24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-asadi24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Kavosh
family: Asadi
- given: Yao
family: Liu
- given: Shoham
family: Sabach
- given: Ming
family: Yin
- given: Rasool
family: Fakoor
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1902-1923
id: asadi24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1902
lastpage: 1923
published: 2024-07-08 00:00:00 +0000
- title: 'Translation Equivariant Transformer Neural Processes'
abstract: 'The effectiveness of neural processes (NPs) in modelling posterior prediction maps—the mapping from data to posterior predictive distributions—has significantly improved since their inception. This improvement can be attributed to two principal factors: (1) advancements in the architecture of permutation invariant set functions, which are intrinsic to all NPs; and (2) leveraging symmetries present in the true posterior predictive map, which are problem dependent. Transformers are a notable development in permutation invariant set functions, and their utility within NPs has been demonstrated through the family of models we refer to as TNPs. Despite significant interest in TNPs, little attention has been given to incorporating symmetries. Notably, the posterior prediction maps for data that are stationary—a common assumption in spatio-temporal modelling—exhibit translation equivariance. In this paper, we introduce of a new family of translation equivariant TNPs that incorporate *translation equivariance*. Through an extensive range of experiments on synthetic and real-world spatio-temporal data, we demonstrate the effectiveness of TE-TNPs relative to their non-translation-equivariant counterparts and other NP baselines.'
volume: 235
URL: https://proceedings.mlr.press/v235/ashman24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/ashman24a/ashman24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-ashman24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Matthew
family: Ashman
- given: Cristiana
family: Diaconu
- given: Junhyuck
family: Kim
- given: Lakee
family: Sivaraya
- given: Stratis
family: Markou
- given: James
family: Requeima
- given: Wessel P
family: Bruinsma
- given: Richard E.
family: Turner
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1924-1944
id: ashman24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1924
lastpage: 1944
published: 2024-07-08 00:00:00 +0000
- title: 'Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages'
abstract: 'We study the problem of private vector mean estimation in the shuffle model of privacy where $n$ users each have a unit vector $v^{(i)} \in \mathbb{R}^d$. We propose a new multi-message protocol that achieves the optimal error using $O(\min(n\varepsilon^2,d))$ messages per user. Moreover, we show that any (unbiased) protocol that achieves optimal error must require each user to send $\Omega(\min(n\varepsilon^2,d)/\log(n))$ messages, demonstrating the optimality of our message complexity up to logarithmic factors. Additionally, we study the single-message setting and design a protocol that achieves mean squared error $O(dn^{d/(d+2)}\varepsilon^{-4/(d+2)})$. Moreover, we show that *any* single-message protocol must incur mean squared error $\Omega(dn^{d/(d+2)})$, showing that our protocol is optimal in the standard setting where $\varepsilon = \Theta(1)$. Finally, we study robustness to malicious users and show that malicious users can incur large additive error with a single shuffler.'
volume: 235
URL: https://proceedings.mlr.press/v235/asi24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/asi24a/asi24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-asi24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hilal
family: Asi
- given: Vitaly
family: Feldman
- given: Jelani
family: Nelson
- given: Huy
family: Nguyen
- given: Kunal
family: Talwar
- given: Samson
family: Zhou
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1945-1970
id: asi24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1945
lastpage: 1970
published: 2024-07-08 00:00:00 +0000
- title: 'Bifurcated Attention for Single-Context Large-Batch Sampling'
abstract: 'In our study, we present bifurcated attention, a method developed for language model inference in single-context batch sampling contexts. This approach aims to reduce redundant memory IO costs, a significant factor in latency for high batch sizes and long context lengths. Bifurcated attention achieves this by dividing the attention mechanism during incremental decoding into two distinct GEMM operations, focusing on the KV cache from prefill and the decoding process. This method ensures precise computation and maintains the usual computational load (FLOPs) of standard attention mechanisms, but with reduced memory IO. Bifurcated attention is also compatible with multi-query attention mechanism known for reduced memory IO for KV cache, further enabling higher batch size and context length. The resulting efficiency leads to lower latency, improving suitability for real-time applications, e.g., enabling massively-parallel answer generation without substantially increasing latency, enhancing performance when integrated with post-processing techniques such as reranking.'
volume: 235
URL: https://proceedings.mlr.press/v235/athiwaratkun24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/athiwaratkun24a/athiwaratkun24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-athiwaratkun24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ben
family: Athiwaratkun
- given: Sujan Kumar
family: Gonugondla
- given: Sanjay Krishna
family: Gouda
- given: Haifeng
family: Qian
- given: Hantian
family: Ding
- given: Qing
family: Sun
- given: Jun
family: Wang
- given: Jiacheng
family: Guo
- given: Liangfu
family: Chen
- given: Parminder
family: Bhatia
- given: Ramesh
family: Nallapati
- given: Sudipta
family: Sengupta
- given: Bing
family: Xiang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1971-1991
id: athiwaratkun24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1971
lastpage: 1991
published: 2024-07-08 00:00:00 +0000
- title: 'Delaunay Graph: Addressing Over-Squashing and Over-Smoothing Using Delaunay Triangulation'
abstract: 'GNNs rely on the exchange of messages to distribute information along the edges of the graph. This approach makes the efficiency of architectures highly dependent on the specific structure of the input graph. Certain graph topologies lead to inefficient information propagation, resulting in a phenomenon known as over-squashing. While the majority of existing methods address over-squashing by rewiring the input graph, our novel approach involves constructing a graph directly from features using Delaunay Triangulation. We posit that the topological properties of the resulting graph prove advantageous for mitigate oversmoothing and over-squashing. Our extensive experimentation demonstrates that our method consistently outperforms established graph rewiring methods.'
volume: 235
URL: https://proceedings.mlr.press/v235/attali24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/attali24a/attali24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-attali24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hugo
family: Attali
- given: Davide
family: Buscaldi
- given: Nathalie
family: Pernelle
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 1992-2008
id: attali24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 1992
lastpage: 2008
published: 2024-07-08 00:00:00 +0000
- title: 'How Free is Parameter-Free Stochastic Optimization?'
abstract: 'We study the problem of parameter-free stochastic optimization, inquiring whether, and under what conditions, do fully parameter-free methods exist: these are methods that achieve convergence rates competitive with optimally tuned methods, without requiring significant knowledge of the true problem parameters. Existing parameter-free methods can only be considered “partially” parameter-free, as they require some non-trivial knowledge of the true problem parameters, such as a bound on the stochastic gradient norms, a bound on the distance to a minimizer, etc. In the non-convex setting, we demonstrate that a simple hyperparameter search technique results in a fully parameter-free method that outperforms more sophisticated state-of-the-art algorithms. We also provide a similar result in the convex setting with access to noisy function values under mild noise assumptions. Finally, assuming only access to stochastic gradients, we establish a lower bound that renders fully parameter-free stochastic convex optimization infeasible, and provide a method which is (partially) parameter-free up to the limit indicated by our lower bound.'
volume: 235
URL: https://proceedings.mlr.press/v235/attia24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/attia24a/attia24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-attia24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Amit
family: Attia
- given: Tomer
family: Koren
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2009-2034
id: attia24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2009
lastpage: 2034
published: 2024-07-08 00:00:00 +0000
- title: 'Information Complexity of Stochastic Convex Optimization: Applications to Generalization, Memorization, and Tracing'
abstract: 'In this work, we investigate the interplay between memorization and learning in the context of *stochastic convex optimization* (SCO). We define memorization via the information a learning algorithm reveals about its training data points. We then quantify this information using the framework of conditional mutual information (CMI) proposed by Steinke and Zakynthinou (2020). Our main result is a precise characterization of the tradeoff between the accuracy of a learning algorithm and its CMI, answering an open question posed by Livni (2023). We show that, in the $L^2$ Lipschitz–bounded setting and under strong convexity, every learner with an excess error $\epsilon$ has CMI bounded below by $\Omega(1/\epsilon^2)$ and $\Omega(1/\epsilon)$, respectively. We further demonstrate the essential role of memorization in learning problems in SCO by designing an adversary capable of accurately identifying a significant fraction of the training samples in specific SCO problems. Finally, we enumerate several implications of our results, such as a limitation of generalization bounds based on CMI and the incompressibility of samples in SCO problems.'
volume: 235
URL: https://proceedings.mlr.press/v235/attias24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/attias24a/attias24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-attias24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Idan
family: Attias
- given: Gintare Karolina
family: Dziugaite
- given: Mahdi
family: Haghifam
- given: Roi
family: Livni
- given: Daniel M.
family: Roy
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2035-2068
id: attias24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2035
lastpage: 2068
published: 2024-07-08 00:00:00 +0000
- title: 'Agnostic Sample Compression Schemes for Regression'
abstract: 'We obtain the first positive results for bounded sample compression in the agnostic regression setting with the $\ell_p$ loss, where $p\in [1,\infty]$. We construct a generic approximate sample compression scheme for real-valued function classes exhibiting exponential size in the fat-shattering dimension but independent of the sample size. Notably, for linear regression, an approximate compression of size linear in the dimension is constructed. Moreover, for $\ell_1$ and $\ell_\infty$ losses, we can even exhibit an efficient exact sample compression scheme of size linear in the dimension. We further show that for every other $\ell_p$ loss, $p\in (1,\infty)$, there does not exist an exact agnostic compression scheme of bounded size. This refines and generalizes a negative result of David, Moran, and Yehudayoff (2016) for the $\ell_2$ loss. We close by posing general open questions: for agnostic regression with $\ell_1$ loss, does every function class admit an exact compression scheme of polynomial size in the pseudo-dimension? For the $\ell_2$ loss, does every function class admit an approximate compression scheme of polynomial size in the fat-shattering dimension? These questions generalize Warmuth’s classic sample compression conjecture for realizable-case classification (Warmuth, 2003).'
volume: 235
URL: https://proceedings.mlr.press/v235/attias24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/attias24b/attias24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-attias24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Idan
family: Attias
- given: Steve
family: Hanneke
- given: Aryeh
family: Kontorovich
- given: Menachem
family: Sadigurschi
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2069-2085
id: attias24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2069
lastpage: 2085
published: 2024-07-08 00:00:00 +0000
- title: 'Data-Efficient Learning via Clustering-Based Sensitivity Sampling: Foundation Models and Beyond'
abstract: 'We study the data selection problem, whose aim is to select a small representative subset of data that can be used to efficiently train a machine learning model. We present a new data selection approach based on $k$-means clustering and sensitivity sampling. Assuming access to an embedding representation of the data with respect to which the model loss is Holder continuous, our approach provably allows selecting a set of “typical” $k + 1/\varepsilon^2$ elements whose average loss corresponds to the average loss of the whole dataset, up to a multiplicative $(1\pm\varepsilon)$ factor and an additive $\varepsilon \lambda \Phi_k$, where $\Phi_k$ represents the $k$-means cost for the input embeddings and $\lambda$ is the Holder constant. We furthermore demonstrate the performance and scalability of our approach on fine-tuning foundation models and show that it outperforms state-of-the-art methods. We also show how it can be applied on linear regression, leading to a new sampling strategy that surprisingly matches the performance of leverage score sampling, while being conceptually simpler and more scalable.'
volume: 235
URL: https://proceedings.mlr.press/v235/axiotis24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/axiotis24a/axiotis24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-axiotis24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Kyriakos
family: Axiotis
- given: Vincent
family: Cohen-Addad
- given: Monika
family: Henzinger
- given: Sammy
family: Jerome
- given: Vahab
family: Mirrokni
- given: David
family: Saulpic
- given: David
family: Woodruff
- given: Michael
family: Wunder
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2086-2107
id: axiotis24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2086
lastpage: 2107
published: 2024-07-08 00:00:00 +0000
- title: 'Random features models: a way to study the success of naive imputation'
abstract: 'Constant (naive) imputation is still widely used in practice as this is a first easy-to-use technique to deal with missing data. Yet, this simple method could be expected to induce a large bias for prediction purposes, as the imputed input may strongly differ from the true underlying data. However, recent works suggest that this bias is low in the context of high-dimensional linear predictors when data is supposed to be missing completely at random (MCAR). This paper completes the picture for linear predictors by confirming the intuition that the bias is negligible and that surprisingly naive imputation also remains relevant in very low dimension. To this aim, we consider a unique underlying random features model, which offers a rigorous framework for studying predictive performances, whilst the dimension of the observed features varies. Building on these theoretical results, we establish finite-sample bounds on stochastic gradient (SGD) predictors applied to zero-imputed data, a strategy particularly well suited for large-scale learning. If the MCAR assumption appears to be strong, we show that similar favorable behaviors occur for more complex missing data scenarios.'
volume: 235
URL: https://proceedings.mlr.press/v235/ayme24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/ayme24a/ayme24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-ayme24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alexis
family: Ayme
- given: Claire
family: Boyer
- given: Aymeric
family: Dieuleveut
- given: Erwan
family: Scornet
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2108-2134
id: ayme24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2108
lastpage: 2134
published: 2024-07-08 00:00:00 +0000
- title: 'Switching the Loss Reduces the Cost in Batch Reinforcement Learning'
abstract: 'We propose training fitted Q-iteration with log-loss (FQI-LOG) for batch reinforcement learning (RL). We show that the number of samples needed to learn a near-optimal policy with FQI-LOG scales with the accumulated cost of the optimal policy, which is zero in problems where acting optimally achieves the goal and incurs no cost. In doing so, we provide a general framework for proving small-cost bounds, i.e. bounds that scale with the optimal achievable cost, in batch RL. Moreover, we empirically verify that FQI-LOG uses fewer samples than FQI trained with squared loss on problems where the optimal policy reliably achieves the goal.'
volume: 235
URL: https://proceedings.mlr.press/v235/ayoub24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/ayoub24a/ayoub24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-ayoub24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alex
family: Ayoub
- given: Kaiwen
family: Wang
- given: Vincent
family: Liu
- given: Samuel
family: Robertson
- given: James
family: Mcinerney
- given: Dawen
family: Liang
- given: Nathan
family: Kallus
- given: Csaba
family: Szepesvari
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2135-2158
id: ayoub24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2135
lastpage: 2158
published: 2024-07-08 00:00:00 +0000
- title: 'Bipartite Matching in Massive Graphs: A Tight Analysis of EDCS'
abstract: 'Maximum matching is one of the most fundamental combinatorial optimization problems with applications in various contexts such as balanced clustering, data mining, resource allocation, and online advertisement. In many of these applications, the input graph is massive. The sheer size of these inputs makes it impossible to store the whole graph in the memory of a single machine and process it there. Graph sparsification has been an extremely powerful tool to alleviate this problem. In this paper, we study a highly successful and versatile sparsifier for the matching problem: the *edge-degree constrained subgraph (EDCS)* introduced first by Bernstein & Stein 2015 The EDCS has a parameter $\beta \geq 2$ which controls the density of the sparsifier. It has been shown through various proofs in the literature that by picking a subgraph with $O(n\beta)$ edges, the EDCS includes a matching of size at least $2/3-O(1/\beta)$ times the maximum matching size. As such, by increasing $\beta$ the approximation ratio of EDCS gets closer and closer to $2/3$. In this paper, we propose a new approach for analyzing the approximation ratio of EDCS. Our analysis is *tight* for any value of $\beta$. Namely, we pinpoint the precise approximation ratio of EDCS for any sparsity parameter $\beta$. Our analysis reveals that one does not necessarily need to increase $\beta$ to improve approximation, as suggested by previous analysis. In particular, the best choice turns out to be $\beta = 6$, which achieves an approximation ratio of $.677$! This is arguably surprising as it is even better than $2/3 \sim .666$, the bound that was widely believed to be the limit for EDCS.'
volume: 235
URL: https://proceedings.mlr.press/v235/azarmehr24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/azarmehr24a/azarmehr24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-azarmehr24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Amir
family: Azarmehr
- given: Soheil
family: Behnezhad
- given: Mohammad
family: Roghani
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2159-2167
id: azarmehr24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2159
lastpage: 2167
published: 2024-07-08 00:00:00 +0000
- title: 'What is the Long-Run Distribution of Stochastic Gradient Descent? A Large Deviations Analysis'
abstract: 'In this paper, we examine the long-run distribution of stochastic gradient descent (SGD) in general, non-convex problems. Specifically, we seek to understand which regions of the problem’s state space are more likely to be visited by SGD, and by how much. Using an approach based on the theory of large deviations and randomly perturbed dynamical systems, we show that the long-run distribution of SGD resembles the Boltzmann-Gibbs distribution of equilibrium thermodynamics with temperature equal to the method’s step-size and energy levels determined by the problem’s objective and the statistics of the noise. In particular, we show that, in the long run, (*a*) the problem’s critical region is visited exponentially more often than any non-critical region; (*b*) the iterates of SGD are exponentially concentrated around the problem’s minimum energy state (which does not always coincide with the global minimum of the objective); (*c*) all other connected components of critical points are visited with frequency that is exponentially proportional to their energy level; and, finally, (*d*) any component of local maximizers or saddle points is "dominated" by a component of local minimizers which is visited exponentially more often.'
volume: 235
URL: https://proceedings.mlr.press/v235/azizian24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/azizian24a/azizian24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-azizian24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Waı̈ss
family: Azizian
- given: Franck
family: Iutzeler
- given: Jerome
family: Malick
- given: Panayotis
family: Mertikopoulos
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2168-2229
id: azizian24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2168
lastpage: 2229
published: 2024-07-08 00:00:00 +0000
- title: 'HyperFields: Towards Zero-Shot Generation of NeRFs from Text'
abstract: 'We introduce HyperFields, a method for generating text-conditioned Neural Radiance Fields (NeRFs) with a single forward pass and (optionally) some fine-tuning. Key to our approach are: (i) a dynamic hypernetwork, which learns a smooth mapping from text token embeddings to the space of NeRFs; (ii) NeRF distillation training, which distills scenes encoded in individual NeRFs into one dynamic hypernetwork. These techniques enable a single network to fit over a hundred unique scenes. We further demonstrate that HyperFields learns a more general map between text and NeRFs, and consequently is capable of predicting novel in-distribution and out-of-distribution scenes — either zero-shot or with a few finetuning steps. Finetuning HyperFields benefits from accelerated convergence thanks to the learned general map, and is capable of synthesizing novel scenes 5 to 10 times faster than existing neural optimization-based methods. Our ablation experiments show that both the dynamic architecture and NeRF distillation are critical to the expressivity of HyperFields.'
volume: 235
URL: https://proceedings.mlr.press/v235/babu24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/babu24a/babu24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-babu24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sudarshan
family: Babu
- given: Richard
family: Liu
- given: Avery
family: Zhou
- given: Michael
family: Maire
- given: Greg
family: Shakhnarovich
- given: Rana
family: Hanocka
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2230-2247
id: babu24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2230
lastpage: 2247
published: 2024-07-08 00:00:00 +0000
- title: 'Online Matrix Completion: A Collaborative Approach with Hott Items'
abstract: 'We investigate the low rank matrix completion problem in an online setting with ${M}$ users, ${N}$ items, ${T}$ rounds, and an unknown rank-$r$ reward matrix ${R}\in \mathbb{R}^{{M}\times {N}}$. This problem has been well-studied in the literature and has several applications in practice. In each round, we recommend ${S}$ carefully chosen distinct items to every user and observe noisy rewards. In the regime where ${M},{N} >> {T}$, we propose two distinct computationally efficient algorithms for recommending items to users and analyze them under the benign *hott items* assumption 1) First, for ${S}=1$, under additional incoherence/smoothness assumptions on ${R}$, we propose the phased algorithm PhasedClusterElim. Our algorithm obtains a near-optimal per-user regret of $\tilde{O}({N}{M}^{-1}(\Delta^{-1}+\Delta_{\text{hott}}^{-2}))$ where $\Delta_{\text{hott}},\Delta$ are problem-dependent gap parameters with $\Delta_{\text{hott}} >> \Delta$ almost always. 2) Second, we consider a simplified setting with ${S}=r$ where we make significantly milder assumptions on ${R}$. Here, we introduce another phased algorithm, DeterminantElim, to derive a regret guarantee of $\tilde{O}({N}{M}^{-1/r}\Delta_\text{det}^{-1}))$ where $\Delta_{\text{det}}$ is another problem-dependent gap. Both algorithms crucially use collaboration among users to jointly eliminate sub-optimal items for groups of users successively in phases, but with distinctive and novel approaches.'
volume: 235
URL: https://proceedings.mlr.press/v235/baby24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/baby24a/baby24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-baby24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Dheeraj
family: Baby
- given: Soumyabrata
family: Pal
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2248-2276
id: baby24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2248
lastpage: 2276
published: 2024-07-08 00:00:00 +0000
- title: 'Differentiable Weightless Neural Networks'
abstract: 'We introduce the Differentiable Weightless Neural Network (DWN), a model based on interconnected lookup tables. Training of DWNs is enabled by a novel Extended Finite Difference technique for approximate differentiation of binary values. We propose Learnable Mapping, Learnable Reduction, and Spectral Regularization to further improve the accuracy and efficiency of these models. We evaluate DWNs in three edge computing contexts: (1) an FPGA-based hardware accelerator, where they demonstrate superior latency, throughput, energy efficiency, and model area compared to state-of-the-art solutions, (2) a low-power microcontroller, where they achieve preferable accuracy to XGBoost while subject to stringent memory constraints, and (3) ultra-low-cost chips, where they consistently outperform small models in both accuracy and projected hardware area. DWNs also compare favorably against leading approaches for tabular datasets, with higher average rank. Overall, our work positions DWNs as a pioneering solution for edge-compatible high-throughput neural networks.'
volume: 235
URL: https://proceedings.mlr.press/v235/bacellar24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bacellar24a/bacellar24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bacellar24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alan Tendler Leibel
family: Bacellar
- given: Zachary
family: Susskind
- given: Mauricio
family: Breternitz Jr
- given: Eugene
family: John
- given: Lizy Kurian
family: John
- given: Priscila Machado Vieira
family: Lima
- given: Felipe M.G.
family: França
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2277-2295
id: bacellar24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2277
lastpage: 2295
published: 2024-07-08 00:00:00 +0000
- title: 'The Pitfalls of Next-Token Prediction'
abstract: 'Can a mere next-token predictor faithfully model human thinking? Our work is aimed at crystallizing this intuitive concern, which is currently fragmented in the literature. First, we emphasize isolating the two phases of next-token prediction that are often conflated: autoregression during inference vs. teacher-forcing during training. We argue that the previously-identified problem of "exponential error accumulation" is a symptom of autoregressive inference. But more concerningly, we identify that teacher-forcing can let the model fit the training data by cheating, causing total in-distribution failure. We design a minimal planning task where empirically both the Transformer and the Mamba architecture fail in this manner - remarkably, despite the task being easy to learn. Overall, our work consolidates these and other essential arguments surrounding next-token prediction. We hope this effort can ground future discussions and inspire explorations beyond the next-token prediction paradigm.'
volume: 235
URL: https://proceedings.mlr.press/v235/bachmann24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bachmann24a/bachmann24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bachmann24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Gregor
family: Bachmann
- given: Vaishnavh
family: Nagarajan
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2296-2318
id: bachmann24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2296
lastpage: 2318
published: 2024-07-08 00:00:00 +0000
- title: 'Simulation of Graph Algorithms with Looped Transformers'
abstract: 'The execution of graph algorithms using neural networks has recently attracted significant interest due to promising empirical progress. This motivates further understanding of how neural networks can replicate reasoning steps with relational data. In this work, we study the ability of transformer networks to simulate algorithms on graphs from a theoretical perspective. The architecture we use is a looped transformer with extra attention heads that interact with the graph. We prove by construction that this architecture can simulate individual algorithms such as Dijkstra’s shortest path, Breadth- and Depth-First Search, and Kosaraju’s strongly connected components, as well as multiple algorithms simultaneously. The number of parameters in the networks does not increase with the input graph size, which implies that the networks can simulate the above algorithms for any graph. Despite this property, we show a limit to simulation in our solution due to finite precision. Finally, we show a Turing Completeness result with constant width when the extra attention heads are utilized.'
volume: 235
URL: https://proceedings.mlr.press/v235/back-de-luca24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/back-de-luca24a/back-de-luca24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-back-de-luca24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Artur
family: Back De Luca
- given: Kimon
family: Fountoulakis
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2319-2363
id: back-de-luca24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2319
lastpage: 2363
published: 2024-07-08 00:00:00 +0000
- title: 'QBMK: Quantum-based Matching Kernels for Un-attributed Graphs'
abstract: 'In this work, we develop a new Quantum-based Matching Kernel (QBMK) for un-attributed graphs, by computing the kernel-based similarity between the quantum Shannon entropies of aligned vertices through the Continuous-time Quantum Walk (CTQW). The theoretical analysis reveals that the proposed QBMK kernel not only addresses the shortcoming of neglecting the structural correspondence information between graphs arising in existing R-convolution graph kernels, but also overcomes the problem of neglecting the structural differences between pairs of aligned vertices arising in existing vertex-based matching kernels. Moreover, the proposed QBMK kernel can simultaneously capture both global and local structural characteristics through the quantum Shannon entropies. Experimental evaluations on standard graph datasets demonstrate that the proposed QBMK kernel is able to outperform state-of-the-art graph kernels and graph deep learning approaches.'
volume: 235
URL: https://proceedings.mlr.press/v235/bai24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bai24a/bai24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bai24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lu
family: Bai
- given: Lixin
family: Cui
- given: Ming
family: Li
- given: Yue
family: Wang
- given: Edwin
family: Hancock
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2364-2374
id: bai24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2364
lastpage: 2374
published: 2024-07-08 00:00:00 +0000
- title: 'Diffusion Models Demand Contrastive Guidance for Adversarial Purification to Advance'
abstract: 'In adversarial defense, adversarial purification can be viewed as a special generation task with the purpose to remove adversarial attacks and diffusion models excel in adversarial purification for their strong generative power. With different predetermined generation requirements, various types of guidance have been proposed, but few of them focuses on adversarial purification. In this work, we propose to guide diffusion models for adversarial purification using contrastive guidance. We theoretically derive the proper noise level added in the forward process diffusion models for adversarial purification from a feature learning perspective. For the reverse process, it is implied that the role of contrastive loss guidance is to facilitate the evolution towards the signal direction. From the theoretical findings and implications, we design the forward process with the proper amount of Gaussian noise added and the reverse process with the gradient of contrastive loss as the guidance of diffusion models for adversarial purification. Empirically, extensive experiments on CIFAR-10, CIFAR-100, the German Traffic Sign Recognition Benchmark and ImageNet datasets with ResNet and WideResNet classifiers show that our method outperforms most of current adversarial training and adversarial purification methods by a large improvement.'
volume: 235
URL: https://proceedings.mlr.press/v235/bai24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bai24b/bai24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bai24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mingyuan
family: Bai
- given: Wei
family: Huang
- given: Tenghui
family: Li
- given: Andong
family: Wang
- given: Junbin
family: Gao
- given: Cesar F
family: Caiafa
- given: Qibin
family: Zhao
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2375-2391
id: bai24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2375
lastpage: 2391
published: 2024-07-08 00:00:00 +0000
- title: 'On the Complexity of Finite-Sum Smooth Optimization under the Polyak–Łojasiewicz Condition'
abstract: 'This paper considers the optimization problem of the form $\min_{{\bf x}\in{\mathbb R}^d} f({\bf x})\triangleq \frac{1}{n}\sum_{i=1}^n f_i({\bf x})$, where $f(\cdot)$ satisfies the Polyak–Łojasiewicz (PL) condition with parameter $\mu$ and $\{f_i(\cdot)\}_{i=1}^n$ is $L$-mean-squared smooth. We show that any gradient method requires at least $\Omega(n+\kappa\sqrt{n}\log(1/\epsilon))$ incremental first-order oracle (IFO) calls to find an $\epsilon$-suboptimal solution, where $\kappa\triangleq L/\mu$ is the condition number of the problem. This result nearly matches upper bounds of IFO complexity for best-known first-order methods. We also study the problem of minimizing the PL function in the distributed setting such that the individuals $f_1(\cdot),…,f_n(\cdot)$ are located on a connected network of $n$ agents. We provide lower bounds of $\Omega(\kappa/\sqrt{\gamma}\log(1/\epsilon))$, $\Omega((\kappa+\tau\kappa/\sqrt{\gamma})\log(1/\epsilon))$ and $\Omega\big(n+\kappa\sqrt{n}\log(1/\epsilon)\big)$ for communication rounds, time cost and local first-order oracle calls respectively, where $\gamma\in(0,1]$ is the spectral gap of the mixing matrix associated with the network and $\tau>0$ is the time cost of per communication round. Furthermore, we propose a decentralized first-order method that nearly matches above lower bounds in expectation.'
volume: 235
URL: https://proceedings.mlr.press/v235/bai24c.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bai24c/bai24c.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bai24c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yunyan
family: Bai
- given: Yuxing
family: Liu
- given: Luo
family: Luo
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2392-2417
id: bai24c
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2392
lastpage: 2417
published: 2024-07-08 00:00:00 +0000
- title: 'Constrained Ensemble Exploration for Unsupervised Skill Discovery'
abstract: 'Unsupervised Reinforcement Learning (RL) provides a promising paradigm for learning useful behaviors via reward-free per-training. Existing methods for unsupervised RL mainly conduct empowerment-driven skill discovery or entropy-based exploration. However, empowerment often leads to static skills, and pure exploration only maximizes the state coverage rather than learning useful behaviors. In this paper, we propose a novel unsupervised RL framework via an ensemble of skills, where each skill performs partition exploration based on the state prototypes. Thus, each skill can explore the clustered area locally, and the ensemble skills maximize the overall state coverage. We adopt state-distribution constraints for the skill occupancy and the desired cluster for learning distinguishable skills. Theoretical analysis is provided for the state entropy and the resulting skill distributions. Based on extensive experiments on several challenging tasks, we find our method learns well-explored ensemble skills and achieves superior performance in various downstream tasks compared to previous methods.'
volume: 235
URL: https://proceedings.mlr.press/v235/bai24d.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bai24d/bai24d.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bai24d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Chenjia
family: Bai
- given: Rushuai
family: Yang
- given: Qiaosheng
family: Zhang
- given: Kang
family: Xu
- given: Yi
family: Chen
- given: Ting
family: Xiao
- given: Xuelong
family: Li
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2418-2442
id: bai24d
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2418
lastpage: 2442
published: 2024-07-08 00:00:00 +0000
- title: 'Image Hijacks: Adversarial Images can Control Generative Models at Runtime'
abstract: 'Are foundation models secure against malicious actors? In this work, we focus on the image input to a vision-language model (VLM). We discover image hijacks, adversarial images that control the behaviour of VLMs at inference time, and introduce the general Behaviour Matching algorithm for training image hijacks. From this, we derive the Prompt Matching method, allowing us to train hijacks matching the behaviour of an arbitrary user-defined text prompt (e.g. ’the Eiffel Tower is now located in Rome’) using a generic, off-the-shelf dataset unrelated to our choice of prompt. We use Behaviour matching to craft hijacks for four types of attack: forcing VLMs to generate outputs of the adversary’s choice, leak information from their context window, override their safety training, and believe false statements. We study these attacks against LLaVA, a state-of-the-art VLM based on CLIP and LLaMA-2, and find that all attack types achieve a success rate of over 80%. Moreover, our attacks are automated and require only small image perturbations.'
volume: 235
URL: https://proceedings.mlr.press/v235/bailey24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bailey24a/bailey24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bailey24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Luke
family: Bailey
- given: Euan
family: Ong
- given: Stuart
family: Russell
- given: Scott
family: Emmons
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2443-2455
id: bailey24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2443
lastpage: 2455
published: 2024-07-08 00:00:00 +0000
- title: 'An Explicit Frame Construction for Normalizing 3D Point Clouds'
abstract: 'Many real-world datasets are represented as 3D point clouds – yet they often lack a predefined reference frame, posing a challenge for machine learning or general data analysis. Traditional methods for determining reference frames and normalizing 3D point clouds often struggle with specific inputs, lack theoretical guarantees, or require massive data. We introduce a new algorithm that overcomes these limitations and guarantees both universality and compatibility with any learnable framework for 3D point cloud analysis. Our algorithm works with any input point cloud and performs consistently regardless of input complexities, unlike data-driven methods that are susceptible to biases or limited training data. Empirically, our algorithm outperforms existing methods in effectiveness and generalizability across diverse benchmark datasets. Code is available at https://github.com/Utah-Math-Data-Science/alignment.'
volume: 235
URL: https://proceedings.mlr.press/v235/baker24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/baker24a/baker24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-baker24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Justin
family: Baker
- given: Shih-Hsin
family: Wang
- given: Tommaso
family: De Fernex
- given: Bao
family: Wang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2456-2473
id: baker24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2456
lastpage: 2473
published: 2024-07-08 00:00:00 +0000
- title: 'Disentanglement Learning via Topology'
abstract: 'We propose TopDis (Topological Disentanglement), a method for learning disentangled representations via adding a multi-scale topological loss term. Disentanglement is a crucial property of data representations substantial for the explainability and robustness of deep learning models and a step towards high-level cognition. The state-of-the-art methods are based on VAE and encourage the joint distribution of latent variables to be factorized. We take a different perspective on disentanglement by analyzing topological properties of data manifolds. In particular, we optimize the topological similarity for data manifolds traversals. To the best of our knowledge, our paper is the first one to propose a differentiable topological loss for disentanglement learning. Our experiments have shown that the proposed TopDis loss improves disentanglement scores such as MIG, FactorVAE score, SAP score, and DCI disentanglement score with respect to state-of-the-art results while preserving the reconstruction quality. Our method works in an unsupervised manner, permitting us to apply it to problems without labeled factors of variation. The TopDis loss works even when factors of variation are correlated. Additionally, we show how to use the proposed topological loss to find disentangled directions in a trained GAN.'
volume: 235
URL: https://proceedings.mlr.press/v235/balabin24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/balabin24a/balabin24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-balabin24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Nikita
family: Balabin
- given: Daria
family: Voronkova
- given: Ilya
family: Trofimov
- given: Evgeny
family: Burnaev
- given: Serguei
family: Barannikov
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2474-2504
id: balabin24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2474
lastpage: 2504
published: 2024-07-08 00:00:00 +0000
- title: 'Adversarial Attacks on Combinatorial Multi-Armed Bandits'
abstract: 'We study reward poisoning attacks on Combinatorial Multi-armed Bandits (CMAB). We first provide a sufficient and necessary condition for the attackability of CMAB, a notion to capture the vulnerability and robustness of CMAB. The attackability condition depends on the intrinsic properties of the corresponding CMAB instance such as the reward distributions of super arms and outcome distributions of base arms. Additionally, we devise an attack algorithm for attackable CMAB instances. Contrary to prior understanding of multi-armed bandits, our work reveals a surprising fact that the attackability of a specific CMAB instance also depends on whether the bandit instance is known or unknown to the adversary. This finding indicates that adversarial attacks on CMAB are difficult in practice and a general attack strategy for any CMAB instance does not exist since the environment is mostly unknown to the adversary. We validate our theoretical findings via extensive experiments on real-world CMAB applications including probabilistic maximum covering problem, online minimum spanning tree, cascading bandits for online ranking, and online shortest path.'
volume: 235
URL: https://proceedings.mlr.press/v235/balasubramanian24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/balasubramanian24a/balasubramanian24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-balasubramanian24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Rishab
family: Balasubramanian
- given: Jiawei
family: Li
- given: Prasad
family: Tadepalli
- given: Huazheng
family: Wang
- given: Qingyun
family: Wu
- given: Haoyu
family: Zhao
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2505-2526
id: balasubramanian24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2505
lastpage: 2526
published: 2024-07-08 00:00:00 +0000
- title: 'Memory Consolidation Enables Long-Context Video Understanding'
abstract: 'Most transformer-based video encoders are limited to short temporal contexts due to their quadratic complexity. While various attempts have been made to extend this context, this has often come at the cost of both conceptual and computational complexity. We propose to instead re-purpose existing pre-trained video transformers by simply fine-tuning them to attend to memories derived non-parametrically from past activations. By leveraging redundancy reduction, our memory-consolidated vision transformer (MC-ViT) effortlessly extends its context far into the past and exhibits excellent scaling behavior when learning from longer videos. In doing so, MC-ViT sets a new state-of-the-art in long-context video understanding on EgoSchema, Perception Test, and Diving48, outperforming methods that benefit from orders of magnitude more parameters.'
volume: 235
URL: https://proceedings.mlr.press/v235/balazevic24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/balazevic24a/balazevic24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-balazevic24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ivana
family: Balazevic
- given: Yuge
family: Shi
- given: Pinelopi
family: Papalampidi
- given: Rahma
family: Chaabouni
- given: Skanda
family: Koppula
- given: Olivier J
family: Henaff
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2527-2542
id: balazevic24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2527
lastpage: 2542
published: 2024-07-08 00:00:00 +0000
- title: 'Characterizing Large Language Model Geometry Helps Solve Toxicity Detection and Generation'
abstract: 'Large Language Models (LLMs) drive current AI breakthroughs despite very little being known about their internal representations. In this work, we propose to shed the light on LLMs inner mechanisms through the lens of geometry. In particular, we develop in closed form $(i)$ the intrinsic dimension in which the Multi-Head Attention embeddings are constrained to exist and $(ii)$ the partition and per-region affine mappings of the feedforward (MLP) network of LLMs’ layers. Our theoretical findings further enable the design of novel principled solutions applicable to state-of-the-art LLMs. First, we show that, through our geometric understanding, we can bypass LLMs’ RLHF protection by controlling the embedding’s intrinsic dimension through informed prompt manipulation. Second, we derive interpretable geometrical features that can be extracted from any (pre-trained) LLM, providing a rich abstract representation of their inputs. We observe that these features are sufficient to help solve toxicity detection, and even allow the identification of various types of toxicity. Our results demonstrate how, even in large-scale regimes, exact theoretical results can answer practical questions in LLMs. Code: https://github.com/RandallBalestriero/SplineLLM'
volume: 235
URL: https://proceedings.mlr.press/v235/balestriero24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/balestriero24a/balestriero24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-balestriero24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Randall
family: Balestriero
- given: Romain
family: Cosentino
- given: Sarath
family: Shekkizhar
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2543-2565
id: balestriero24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2543
lastpage: 2565
published: 2024-07-08 00:00:00 +0000
- title: 'How Learning by Reconstruction Produces Uninformative Features For Perception'
abstract: 'Input space reconstruction is an attractive representation learning paradigm. Despite interpretability benefit of reconstruction and generation, we identify a misalignment between learning to reconstruct, and learning for perception. We show that the former allocates a model’s capacity towards a subspace of the data explaining the observed variance–a subspace with uninformative features for the latter. For example, the supervised TinyImagenet task with images projected onto the top subspace explaining 90% of the pixel variance can be solved with 45% test accuracy. Using the bottom subspace instead, accounting for only 20% of the pixel variance, reaches 55% test accuracy. Learning by reconstruction is also wasteful as the features for perception are learned last, pushing the need for long training schedules. We finally prove that learning by denoising can alleviate that misalignment for some noise strategies, e.g., masking. While tuning the noise strategy without knowledge of the perception task seems challenging, we provide a solution to detect if a noise strategy is never beneficial regardless of the perception task, e.g., additive Gaussian noise.'
volume: 235
URL: https://proceedings.mlr.press/v235/balestriero24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/balestriero24b/balestriero24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-balestriero24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Randall
family: Balestriero
- given: Yann
family: Lecun
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2566-2585
id: balestriero24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2566
lastpage: 2585
published: 2024-07-08 00:00:00 +0000
- title: 'Combinatorial Approximations for Cluster Deletion: Simpler, Faster, and Better'
abstract: 'Cluster deletion is an NP-hard graph clustering objective with applications in computational biology and social network analysis, where the goal is to delete a minimum number of edges to partition a graph into cliques. We first provide a tighter analysis of two previous approximation algorithms, improving their approximation guarantees from 4 to 3. Moreover, we show that both algorithms can be derandomized in a surprisingly simple way, by greedily taking a vertex of maximum degree in an auxiliary graph and forming a cluster around it. One of these algorithms relies on solving a linear program. Our final contribution is to design a new and purely combinatorial approach for doing so that is far more scalable in theory and practice.'
volume: 235
URL: https://proceedings.mlr.press/v235/balmaseda24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/balmaseda24a/balmaseda24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-balmaseda24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Vicente
family: Balmaseda
- given: Ying
family: Xu
- given: Yixin
family: Cao
- given: Nate
family: Veldt
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2586-2606
id: balmaseda24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2586
lastpage: 2606
published: 2024-07-08 00:00:00 +0000
- title: 'A Field Guide for Pacing Budget and ROS Constraints'
abstract: 'Budget pacing is a popular service that has been offered by major internet advertising platforms since their inception. In the past few years, autobidding products that provide real-time bidding as a service to advertisers have seen a prominent rise in adoption. A popular autobidding stategy is value maximization subject to return-on-spend (ROS) constraints. For historical or business reasons, the systems that govern these two services, namely budget pacing and ROS pacing, are not necessarily always a single unified and coordinated entity that optimizes a global objective subject to both constraints. The purpose of this work is to theoretically and empirically compare algorithms with different degrees of coordination between these two pacing systems. In particular, we compare (a) a fully-decoupled sequential algorithm; (b) a minimally-coupled min-pacing algorithm; (c) a fully-coupled dual-based algorithm. Our main contribution is to theoretically analyze the min-pacing algorithm and show that it attains similar guarantees to the fully-coupled canonical dual-based algorithm. On the other hand, we show that the sequential algorithm, even though appealing by virtue of being fully decoupled, could badly violate the constraints. We validate our theoretical findings empirically by showing that the min-pacing algorithm performs almost as well as the canonical dual-based algorithm on a semi-synthetic dataset that was generated from a large online advertising platform’s auction data.'
volume: 235
URL: https://proceedings.mlr.press/v235/balseiro24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/balseiro24a/balseiro24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-balseiro24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Santiago R.
family: Balseiro
- given: Kshipra
family: Bhawalkar
- given: Zhe
family: Feng
- given: Haihao
family: Lu
- given: Vahab
family: Mirrokni
- given: Balasubramanian
family: Sivan
- given: Di
family: Wang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2607-2638
id: balseiro24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2607
lastpage: 2638
published: 2024-07-08 00:00:00 +0000
- title: 'On the Identifiability of Switching Dynamical Systems'
abstract: 'The identifiability of latent variable models has received increasing attention due to its relevance in interpretability and out-of-distribution generalisation. In this work, we study the identifiability of Switching Dynamical Systems, taking an initial step toward extending identifiability analysis to sequential latent variable models. We first prove the identifiability of Markov Switching Models, which commonly serve as the prior distribution for the continuous latent variables in Switching Dynamical Systems. We present identification conditions for first-order Markov dependency structures, whose transition distribution is parametrised via non-linear Gaussians. We then establish the identifiability of the latent variables and non-linear mappings in Switching Dynamical Systems up to affine transformations, by leveraging identifiability analysis techniques from identifiable deep latent variable models. We finally develop estimation algorithms for identifiable Switching Dynamical Systems. Throughout empirical studies, we demonstrate the practicality of identifiable Switching Dynamical Systems for segmenting high-dimensional time series such as videos, and showcase the use of identifiable Markov Switching Models for regime-dependent causal discovery in climate data.'
volume: 235
URL: https://proceedings.mlr.press/v235/balsells-rodas24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/balsells-rodas24a/balsells-rodas24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-balsells-rodas24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Carles
family: Balsells-Rodas
- given: Yixin
family: Wang
- given: Yingzhen
family: Li
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2639-2672
id: balsells-rodas24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2639
lastpage: 2672
published: 2024-07-08 00:00:00 +0000
- title: 'Analyzing $D^α$ seeding for $k$-means'
abstract: 'One of the most popular clustering algorithms is the celebrated $D^\alpha$ seeding algorithm (also know as $k$-means++ when $\alpha=2$) by Arthur and Vassilvitskii (2007), who showed that it guarantees in expectation an $O(2^{2\alpha}\cdot \log k)$-approximate solution to the ($k$,$\alpha$)-clustering cost (where distances are raised to the power $\alpha$) for any $\alpha\ge 1$. More recently, Balcan, Dick, and White (2018) observed experimentally that using $D^\alpha$ seeding with $\alpha>2$ can lead to a better solution with respect to the standard $k$-means objective (i.e. the $(k,2)$-clustering cost). In this paper, we provide a rigorous understanding of this phenomenon. For any $\alpha>2$, we show that $D^\alpha$ seeding guarantees in expectation an approximation factor of \begin{equation*} O_\alpha \left(\left(\frac{\sigma_{\textrm{max}}}{\sigma_{\textrm{min}}}\right)^{2-4/\alpha}\cdot (g_\alpha \cdot \min \lbrace\ell,\log k\rbrace)^{2/\alpha}\right) \end{equation*} with respect to the standard $k$-means cost of any underlying clustering; where $g_\alpha$ is a parameter capturing the concentration of the points in each cluster, $\sigma_{\textrm{max}}$ and $\sigma_{\textrm{min}}$ are the maximum and minimum standard deviation of the clusters around their center, and $\ell$ is the number of distinct mixing weights in the underlying clustering (after rounding them to the nearest power of $2$). For instance, if the underlying clustering is defined by a mixture of $k$ Gaussian distributions with equal cluster variance (up to a constant-factor), then our result implies that: (1) if there are a constant number of mixing weights, any constant $\alpha>2$ yields a constant-factor approximation; (2) if the mixing weights are arbitrary, any constant $\alpha>2$ yields an $O\left(\log^{2/\alpha}k\right)$-approximation, and $\alpha=\Theta(\log\log k)$ yields an $O(\log\log k)^3$-approximation. We complement these results by some lower bounds showing that the dependency on $g_\alpha$ and $\sigma_{\textrm{max}}/\sigma_{\textrm{min}}$ is tight. Finally, we provide an experimental validation of the effects of the aforementioned parameters when using $D^\alpha$ seeding.'
volume: 235
URL: https://proceedings.mlr.press/v235/bamas24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bamas24a/bamas24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bamas24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Etienne
family: Bamas
- given: Sai Ganesh
family: Nagarajan
- given: Ola
family: Svensson
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2673-2699
id: bamas24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2673
lastpage: 2699
published: 2024-07-08 00:00:00 +0000
- title: 'Parsimonious Learning-Augmented Approximations for Dense Instances of $\mathcalNP$-hard Problems'
abstract: 'The classical work of (Arora et al., 1999) provides a scheme that gives, for any $\epsilon>0$, a polynomial time $1-\epsilon$ approximation algorithm for dense instances of a family of $\mathcal{NP}$-hard problems, such as Max-CUT and Max-$k$-SAT. In this paper we extend and speed up this scheme using a logarithmic number of one-bit predictions. We propose a learning augmented framework which aims at finding fast algorithms which guarantees approximation consistency, smoothness and robustness with respect to the prediction error. We provide such algorithms, which moreover use predictions parsimoniously, for dense instances of various optimization problems.'
volume: 235
URL: https://proceedings.mlr.press/v235/bampis24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bampis24a/bampis24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bampis24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Evripidis
family: Bampis
- given: Bruno
family: Escoffier
- given: Michalis
family: Xefteris
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2700-2714
id: bampis24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2700
lastpage: 2714
published: 2024-07-08 00:00:00 +0000
- title: 'Fair Resource Allocation in Multi-Task Learning'
abstract: 'By jointly learning multiple tasks, multi-task learning (MTL) can leverage the shared knowledge across tasks, resulting in improved data efficiency and generalization performance. However, a major challenge in MTL lies in the presence of conflicting gradients, which can hinder the fair optimization of some tasks and subsequently impede MTL’s ability to achieve better overall performance. Inspired by fair resource allocation in communication networks, we formulate the optimization of MTL as a utility maximization problem, where the loss decreases across tasks are maximized under different fairness measurements. To address the problem, we propose FairGrad, a novel optimization objective. FairGrad not only enables flexible emphasis on certain tasks but also achieves a theoretical convergence guarantee. Extensive experiments demonstrate that our method can achieve state-of-the-art performance among gradient manipulation methods on a suite of multi-task benchmarks in supervised learning and reinforcement learning. Furthermore, we incorporate the idea of $\alpha$-fairness into the loss functions of various MTL methods. Extensive empirical studies demonstrate that their performance can be significantly enhanced. Code is available at https://github.com/OptMN-Lab/fairgrad.'
volume: 235
URL: https://proceedings.mlr.press/v235/ban24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/ban24a/ban24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-ban24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hao
family: Ban
- given: Kaiyi
family: Ji
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2715-2731
id: ban24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2715
lastpage: 2731
published: 2024-07-08 00:00:00 +0000
- title: 'Linguistic Calibration of Long-Form Generations'
abstract: 'Language models (LMs) may lead their users to make suboptimal downstream decisions when they confidently hallucinate. This issue can be mitigated by having the LM verbally convey the probability that its claims are correct, but existing models cannot produce long-form text with calibrated confidence statements. Through the lens of decision-making, we define linguistic calibration for long-form generations: an LM is linguistically calibrated if its generations enable its users to make calibrated probabilistic predictions. This definition enables a training framework where a supervised finetuning step bootstraps an LM to emit long-form generations with confidence statements such as "I estimate a 30% chance of..." or "I am certain that...", followed by a reinforcement learning step which rewards generations that enable a user to provide calibrated answers to related questions. We linguistically calibrate Llama 2 7B and find in automated and human evaluations of long-form generations that it is significantly more calibrated than strong finetuned factuality baselines with comparable accuracy. These findings generalize under significant domain shifts to scientific and biomedical questions and to an entirely held-out person biography generation task. Our results demonstrate that long-form generations may be calibrated end-to-end by constructing an objective in the space of the predictions that users make in downstream decision-making.'
volume: 235
URL: https://proceedings.mlr.press/v235/band24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/band24a/band24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-band24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Neil
family: Band
- given: Xuechen
family: Li
- given: Tengyu
family: Ma
- given: Tatsunori
family: Hashimoto
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2732-2778
id: band24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2732
lastpage: 2778
published: 2024-07-08 00:00:00 +0000
- title: 'Relational DNN Verification With Cross Executional Bound Refinement'
abstract: 'We focus on verifying relational properties defined over deep neural networks (DNNs) such as robustness against universal adversarial perturbations (UAP), certified worst-case hamming distance for binary string classifications, etc. Precise verification of these properties requires reasoning about multiple executions of the same DNN. However, most of the existing works in DNN verification only handle properties defined over single executions and as a result, are imprecise for relational properties. Though few recent works for relational DNN verification, capture linear dependencies between the inputs of multiple executions, they do not leverage dependencies between the outputs of hidden layers producing imprecise results. We develop a scalable relational verifier RACoon that utilizes cross-execution dependencies at all layers of the DNN gaining substantial precision over SOTA baselines on a wide range of datasets, networks, and relational properties.'
volume: 235
URL: https://proceedings.mlr.press/v235/banerjee24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/banerjee24a/banerjee24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-banerjee24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Debangshu
family: Banerjee
- given: Gagandeep
family: Singh
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2779-2807
id: banerjee24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2779
lastpage: 2807
published: 2024-07-08 00:00:00 +0000
- title: 'A Dynamic Algorithm for Weighted Submodular Cover Problem'
abstract: 'We initiate the study of the submodular cover problem in a dynamic setting where the elements of the ground set are inserted and deleted. In the classical submodular cover problem, we are given a monotone submodular function $f : 2^{V} \to \mathbb{R}^{\ge 0}$ and the goal is to obtain a set $S \subseteq V$ that minimizes the cost subject to the constraint $f(S) = f(V)$. This is a classical problem in computer science and generalizes the Set Cover problem, 2-Set Cover, and dominating set problem among others. We consider this problem in a dynamic setting where there are updates to our set $V$, in the form of insertions and deletions of elements from a ground set $\mathcal{V}$, and the goal is to maintain an approximately optimal solution with low query complexity per update. For this problem, we propose a randomized algorithm that, in expectation, obtains a $(1-O(\epsilon), O(\epsilon^{-1}))$-bicriteria approximation using polylogarithmic query complexity per update.'
volume: 235
URL: https://proceedings.mlr.press/v235/banihashem24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/banihashem24a/banihashem24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-banihashem24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Kiarash
family: Banihashem
- given: Samira
family: Goudarzi
- given: Mohammadtaghi
family: Hajiaghayi
- given: Peyman
family: Jabbarzade
- given: Morteza
family: Monemizadeh
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2808-2830
id: banihashem24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2808
lastpage: 2830
published: 2024-07-08 00:00:00 +0000
- title: 'Dynamic Metric Embedding into lp Space'
abstract: 'We give the first non-trivial decremental dynamic embedding of a weighted, undirected graph $G$ into $\ell_p$ space. Given a weighted graph $G$ undergoing a sequence of edge weight increases, the goal of this problem is to maintain a (randomized) mapping $\phi: (G,d) \to (X,\ell_p)$ from the set of vertices of the graph to the $\ell_p$ space such that for every pair of vertices $u$ and $v$, the expected distance between $\phi(u)$ and $\phi(v)$ in the $\ell_p$ metric is within a small multiplicative factor, referred to as the distortion, of their distance in $G$. Our main result is a dynamic algorithm with expected distortion $O(\log^2 n)$ and total update time $O\left((m^{1+o(1)} \log^2 W + Q)\log(nW) \right)$, where $W$ is the maximum weight of the edges, $Q$ is the total number of updates and $n, m$ denote the number of vertices and edges in $G$ respectively. This is the first result of its kind, extending the seminal result of Bourgain ’85 to the expanding field of dynamic algorithms. Moreover, we demonstrate that in the fully dynamic regime, where we tolerate edge insertions as well as deletions, no algorithm can explicitly maintain an embedding into $\ell_p$ space that has a low distortion with high probability.'
volume: 235
URL: https://proceedings.mlr.press/v235/banihashem24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/banihashem24b/banihashem24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-banihashem24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Kiarash
family: Banihashem
- given: Mohammadtaghi
family: Hajiaghayi
- given: Dariusz Rafal
family: Kowalski
- given: Jan
family: Olkowski
- given: Max
family: Springer
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2831-2845
id: banihashem24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2831
lastpage: 2845
published: 2024-07-08 00:00:00 +0000
- title: 'VNN: Verification-Friendly Neural Networks with Hard Robustness Guarantees'
abstract: 'Machine learning techniques often lack formal correctness guarantees, evidenced by the widespread adversarial examples that plague most deep-learning applications. This lack of formal guarantees resulted in several research efforts that aim at verifying Deep Neural Networks (DNNs), with a particular focus on safety-critical applications. However, formal verification techniques still face major scalability and precision challenges. The over-approximation introduced during the formal verification process to tackle the scalability challenge often results in inconclusive analysis. To address this challenge, we propose a novel framework to generate Verification-Friendly Neural Networks (VNNs). We present a post-training optimization framework to achieve a balance between preserving prediction performance and verification-friendliness. Our proposed framework results in VNNs that are comparable to the original DNNs in terms of prediction performance, while amenable to formal verification techniques. This essentially enables us to establish robustness for more VNNs than their DNN counterparts, in a time-efficient manner.'
volume: 235
URL: https://proceedings.mlr.press/v235/baninajjar24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/baninajjar24a/baninajjar24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-baninajjar24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Anahita
family: Baninajjar
- given: Ahmed
family: Rezine
- given: Amir
family: Aminifar
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2846-2856
id: baninajjar24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2846
lastpage: 2856
published: 2024-07-08 00:00:00 +0000
- title: 'Provable Benefits of Local Steps in Heterogeneous Federated Learning for Neural Networks: A Feature Learning Perspective'
abstract: 'Local steps are crucial for Federated Learning (FL) algorithms and have witnessed great empirical success in reducing communication costs and improving the generalization performance of deep neural networks. However, there are limited studies on the effect of local steps on heterogeneous FL. A few works investigate this problem from the optimization perspective. Woodworth et al. (2020a) showed that the iteration complexity of Local SGD, the most popular FL algorithm, is dominated by the baseline mini-batch SGD, which does not show the benefits of local steps. In addition, Levy (2023) proposed a new local update method that provably benefits over mini-batch SGD. However, in the same setting, there is still no work analyzing the effects of local steps to generalization in a heterogeneous FL setting. Motivated by our experimental findings where Local SGD learns more distinguishing features than parallel SGD, this paper studies the generalization benefits of local steps from a feature learning perspective. We propose a novel federated data model that exhibits a new form of data heterogeneity, under which we show that a convolutional neural network (CNN) trained by GD with *global* updates will miss some pattern-related features, while the network trained by GD with *local* updates can learn all features in polynomial time. Consequently, local steps help CNN generalize better in our data model. In a different parameter setting, we also prove that Local GD with *one-shot* model averaging can learn all features and generalize well in all clients. Our experimental results also confirm the benefits of local steps in improving test accuracy on real-world data.'
volume: 235
URL: https://proceedings.mlr.press/v235/bao24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bao24a/bao24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bao24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yajie
family: Bao
- given: Michael
family: Crawshaw
- given: Mingrui
family: Liu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2857-2902
id: bao24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2857
lastpage: 2902
published: 2024-07-08 00:00:00 +0000
- title: 'Self-attention Networks Localize When QK-eigenspectrum Concentrates'
abstract: 'The self-attention mechanism prevails in modern machine learning. It has an interesting functionality of adaptively selecting tokens from an input sequence by modulating the degree of attention localization, which many researchers speculate is the basis of the powerful model performance but complicates the underlying mechanism of the learning dynamics. In recent years, mainly two arguments have connected attention localization to the model performances. One is the rank collapse, where the embedded tokens by a self-attention block become very similar across different tokens, leading to a less expressive network. The other is the entropy collapse, where the attention probability approaches non-uniform and entails low entropy, making the learning dynamics more likely to be trapped in plateaus. These two failure modes may apparently contradict each other because the rank and entropy collapses are relevant to uniform and non-uniform attention, respectively. To this end, we characterize the notion of attention localization by the eigenspectrum of query-key parameter matrices and reveal that a small eigenspectrum variance leads attention to be localized. Interestingly, the small eigenspectrum variance prevents both rank and entropy collapse, leading to better model expressivity and trainability.'
volume: 235
URL: https://proceedings.mlr.press/v235/bao24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bao24b/bao24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bao24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Han
family: Bao
- given: Ryuichiro
family: Hataya
- given: Ryo
family: Karakida
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2903-2922
id: bao24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2903
lastpage: 2922
published: 2024-07-08 00:00:00 +0000
- title: 'Graph Out-of-Distribution Detection Goes Neighborhood Shaping'
abstract: 'Despite the rich line of research works on out-of-distribution (OOD) detection on images, the literature on OOD detection for interdependent data, e.g., graphs, is still relatively limited. To fill this gap, we introduce TopoOOD as a principled approach that accommodates graph topology and neighborhood context for detecting OOD node instances on graphs. Meanwhile, we enrich the experiment settings by splitting in-distribution (ID) and OOD data based on distinct topological distributions, which presents new benchmarks for a more comprehensive analysis of graph-based OOD detection. The latter is designed to thoroughly assess the performance of these discriminators under distribution shifts involving structural information, providing a rigorous evaluation of methods in the emerging area of OOD detection on graphs. Our experimental results show the competitiveness of the proposed model across multiple datasets, as evidenced by up to a 15% increase in the AUROC and a 50% decrease in the FPR compared to existing state-of-the-art methods.'
volume: 235
URL: https://proceedings.mlr.press/v235/bao24c.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bao24c/bao24c.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bao24c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tianyi
family: Bao
- given: Qitian
family: Wu
- given: Zetian
family: Jiang
- given: Yiting
family: Chen
- given: Jiawei
family: Sun
- given: Junchi
family: Yan
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2923-2943
id: bao24c
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2923
lastpage: 2943
published: 2024-07-08 00:00:00 +0000
- title: 'Stochastic positional embeddings improve masked image modeling'
abstract: 'Masked Image Modeling (MIM) is a promising self-supervised learning approach that enables learning from unlabeled images. Despite its recent success, learning good representations through MIM remains challenging because it requires predicting the right semantic content in accurate locations. For example, given an incomplete picture of a dog, we can guess that there is a tail, but we cannot determine its exact location. In this work, we propose to incorporate location uncertainty to MIM by using stochastic positional embeddings (StoP). Specifically, we condition the model on stochastic masked token positions drawn from a gaussian distribution. We show that using StoP reduces overfitting to location features and guides the model toward learning features that are more robust to location uncertainties. Quantitatively, using StoP improves downstream MIM performance on a variety of downstream tasks. For example, linear probing on ImageNet using ViT-B is improved by $+1.7%$, and by $2.5%$ for ViT-H using 1% of the data.'
volume: 235
URL: https://proceedings.mlr.press/v235/bar24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bar24a/bar24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bar24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Amir
family: Bar
- given: Florian
family: Bordes
- given: Assaf
family: Shocher
- given: Mido
family: Assran
- given: Pascal
family: Vincent
- given: Nicolas
family: Ballas
- given: Trevor
family: Darrell
- given: Amir
family: Globerson
- given: Yann
family: Lecun
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2944-2958
id: bar24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2944
lastpage: 2958
published: 2024-07-08 00:00:00 +0000
- title: 'Subgraphormer: Unifying Subgraph GNNs and Graph Transformers via Graph Products'
abstract: 'In the realm of Graph Neural Networks (GNNs), two exciting research directions have recently emerged: Subgraph GNNs and Graph Transformers. In this paper, we propose an architecture that integrates both approaches, dubbed *Subgraphormer*, which combines the enhanced expressive power, message-passing mechanisms, and aggregation schemes from Subgraph GNNs with attention and positional encodings, arguably the most important components in Graph Transformers. Our method is based on an intriguing new connection we reveal between Subgraph GNNs and product graphs, suggesting that Subgraph GNNs can be formulated as Message Passing Neural Networks (MPNNs) operating on a product of the graph with itself. We use this formulation to design our architecture: first, we devise an attention mechanism based on the connectivity of the product graph. Following this, we propose a novel and efficient positional encoding scheme for Subgraph GNNs, which we derive as a positional encoding for the product graph. Our experimental results demonstrate significant performance improvements over both Subgraph GNNs and Graph Transformers on a wide range of datasets.'
volume: 235
URL: https://proceedings.mlr.press/v235/bar-shalom24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bar-shalom24a/bar-shalom24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bar-shalom24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Guy
family: Bar-Shalom
- given: Beatrice
family: Bevilacqua
- given: Haggai
family: Maron
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2959-2989
id: bar-shalom24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2959
lastpage: 2989
published: 2024-07-08 00:00:00 +0000
- title: 'Scale-Free Image Keypoints Using Differentiable Persistent Homology'
abstract: 'In computer vision, keypoint detection is a fundamental task, with applications spanning from robotics to image retrieval; however, existing learning-based methods suffer from scale dependency, and lack flexibility. This paper introduces a novel approach that leverages Morse theory and persistent homology, powerful tools rooted in algebraic topology. We propose a novel loss function based on the recent introduction of a notion of subgradient in persistent homology, paving the way towards topological learning. Our detector, MorseDet, is the first topology-based learning model for feature detection, which achieves competitive performance in keypoint repeatability and introduces a principled and theoretically robust approach to the problem.'
volume: 235
URL: https://proceedings.mlr.press/v235/barbarani24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/barbarani24a/barbarani24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-barbarani24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Giovanni
family: Barbarani
- given: Francesco
family: Vaccarino
- given: Gabriele
family: Trivigno
- given: Marco
family: Guerra
- given: Gabriele
family: Berton
- given: Carlo
family: Masone
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 2990-3002
id: barbarani24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 2990
lastpage: 3002
published: 2024-07-08 00:00:00 +0000
- title: 'To Each (Textual Sequence) Its Own: Improving Memorized-Data Unlearning in Large Language Models'
abstract: 'LLMs have been found to memorize training textual sequences and regurgitate verbatim said sequences during text generation time. This fact is known to be the cause of privacy and related (e.g., copyright) problems. Unlearning in LLMs then takes the form of devising new algorithms that will properly deal with these side-effects of memorized data, while not hurting the model’s utility. We offer a fresh perspective towards this goal, namely, that each textual sequence to be forgotten should be treated differently when being unlearned based on its degree of memorization within the LLM. We contribute a new metric for measuring unlearning quality, an adversarial attack showing that SOTA algorithms lacking this perspective fail for privacy, and two new unlearning methods based on Gradient Ascent and Task Arithmetic, respectively. A comprehensive performance evaluation across an extensive suite of NLP tasks then mapped the solution space, identifying the best solutions under different scales in model capacities and forget set sizes and quantified the gains of the new approaches.'
volume: 235
URL: https://proceedings.mlr.press/v235/barbulescu24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/barbulescu24a/barbulescu24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-barbulescu24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: George-Octavian
family: Bărbulescu
- given: Peter
family: Triantafillou
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3003-3023
id: barbulescu24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3003
lastpage: 3023
published: 2024-07-08 00:00:00 +0000
- title: 'Sliding Down the Stairs: How Correlated Latent Variables Accelerate Learning with Neural Networks'
abstract: 'Neural networks extract features from data using stochastic gradient descent (SGD). In particular, higher-order input cumulants (HOCs) are crucial for their performance. However, extracting information from the $p$th cumulant of $d$-dimensional inputs is computationally hard: the number of samples required to recover a single direction from an order-$p$ tensor (tensor PCA) using SGD grows as $d^{p−1}$, which is prohibitive for high-dimensional inputs. This result raises the question of how neural networks extract relevant directions from the HOCs of their inputs efficiently. Here, we show that correlations between latent variables along the directions encoded in different input cumulants speed up learning from higher-order correlations. We show this effect analytically by deriving nearly sharp thresholds for the number of samples required by a single neuron to recover these directions using online SGD from a random start in high dimensions. Our analytical results are confirmed in simulations of two-layer neural networks and unveil a new mechanism for hierarchical learning in neural networks'
volume: 235
URL: https://proceedings.mlr.press/v235/bardone24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bardone24a/bardone24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bardone24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lorenzo
family: Bardone
- given: Sebastian
family: Goldt
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3024-3045
id: bardone24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3024
lastpage: 3045
published: 2024-07-08 00:00:00 +0000
- title: 'Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies'
abstract: 'This paper revisits the simple, long-studied, yet still unsolved problem of making image classifiers robust to imperceptible perturbations. Taking CIFAR10 as an example, SOTA clean accuracy is about $100$%, but SOTA robustness to $\ell_{\infty}$-norm bounded perturbations barely exceeds $70$%. To understand this gap, we analyze how model size, dataset size, and synthetic data quality affect robustness by developing the first scaling laws for adversarial training. Our scaling laws reveal inefficiencies in prior art and provide actionable feedback to advance the field. For instance, we discovered that SOTA methods diverge notably from compute-optimal setups, using excess compute for their level of robustness. Leveraging a compute-efficient setup, we surpass the prior SOTA with $20$% ($70$%) fewer training (inference) FLOPs. We trained various compute-efficient models, with our best achieving $74$% AutoAttack accuracy ($+3$% gain). However, our scaling laws also predict robustness slowly grows then plateaus at $90$%: dwarfing our new SOTA by scaling is impractical, and perfect robustness is impossible. To better understand this predicted limit, we carry out a small-scale human evaluation on the AutoAttack data that fools our top-performing model. Concerningly, we estimate that human performance also plateaus near $90$%, which we show to be attributable to $\ell_{\infty}$-constrained attacks’ generation of invalid images not consistent with their original labels. Having characterized limiting roadblocks, we outline promising paths for future research.'
volume: 235
URL: https://proceedings.mlr.press/v235/bartoldson24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bartoldson24a/bartoldson24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bartoldson24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Brian R.
family: Bartoldson
- given: James
family: Diffenderfer
- given: Konstantinos
family: Parasyris
- given: Bhavya
family: Kailkhura
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3046-3072
id: bartoldson24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3046
lastpage: 3072
published: 2024-07-08 00:00:00 +0000
- title: 'Neural Diffusion Models'
abstract: 'Diffusion models have shown remarkable performance on many generative tasks. Despite recent success, most diffusion models are restricted in that they only allow linear transformation of the data distribution. In contrast, broader family of transformations can help train generative distributions more efficiently, simplifying the reverse process and closing the gap between the true negative log-likelihood and the variational approximation. In this paper, we present Neural Diffusion Models (NDMs), a generalization of conventional diffusion models that enables defining and learning time-dependent non-linear transformations of data. We show how to optimise NDMs using a variational bound in a simulation-free setting. Moreover, we derive a time-continuous formulation of NDMs, which allows fast and reliable inference using off-the-shelf numerical ODE and SDE solvers. Finally, we demonstrate the utility of NDMs through experiments on many image generation benchmarks, including MNIST, CIFAR-10, downsampled versions of ImageNet and CelebA-HQ. NDMs outperform conventional diffusion models in terms of likelihood, achieving state-of-the-art results on ImageNet and CelebA-HQ, and produces high-quality samples.'
volume: 235
URL: https://proceedings.mlr.press/v235/bartosh24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bartosh24a/bartosh24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bartosh24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Grigory
family: Bartosh
- given: Dmitry
family: Vetrov
- given: Christian A.
family: Naesseth
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3073-3095
id: bartosh24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3073
lastpage: 3095
published: 2024-07-08 00:00:00 +0000
- title: 'Generalization in Kernel Regression Under Realistic Assumptions'
abstract: 'It is by now well-established that modern over-parameterized models seem to elude the bias-variance tradeoff and generalize well despite overfitting noise. Many recent works attempt to analyze this phenomenon in the relatively tractable setting of kernel regression. However, as we argue in detail, most past works on this topic either make unrealistic assumptions, or focus on a narrow problem setup. This work aims to provide a unified theory to upper bound the excess risk of kernel regression for nearly all common and realistic settings. When applied to common kernels, our results imply benign overfitting in high input dimensions, nearly tempered overfitting in fixed dimensions, and explicit convergence rates for regularized regression. As a by-product, we obtain time-dependent bounds for neural networks trained in the kernel regime. Our results rely on new relative perturbation bounds for the eigenvalues of kernel matrices, which may be of independent interest. These reveal a self-regularization phenomenon, whereby a heavy tail in the eigendecomposition of the kernel implicitly leads to good generalization.'
volume: 235
URL: https://proceedings.mlr.press/v235/barzilai24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/barzilai24a/barzilai24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-barzilai24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Daniel
family: Barzilai
- given: Ohad
family: Shamir
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3096-3132
id: barzilai24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3096
lastpage: 3132
published: 2024-07-08 00:00:00 +0000
- title: 'Local vs. Global Interpretability: A Computational Complexity Perspective'
abstract: 'The local and global interpretability of various ML models has been studied extensively in recent years. However, despite significant progress in the field, many known results remain informal or lack sufficient mathematical rigor. We propose a framework for bridging this gap, by using computational complexity theory to assess local and global perspectives of interpreting ML models. We begin by proposing proofs for two novel insights that are essential for our analysis: (1) a duality between local and global forms of explanations; and (2) the inherent uniqueness of certain global explanation forms. We then use these insights to evaluate the complexity of computing explanations, across three model types representing the extremes of the interpretability spectrum: (1) linear models; (2) decision trees; and (3) neural networks. Our findings offer insights into both the local and global interpretability of these models. For instance, under standard complexity assumptions such as P != NP, we prove that selecting *global* sufficient subsets in linear models is computationally harder than selecting *local* subsets. Interestingly, with neural networks and decision trees, the opposite is true: it is harder to carry out this task locally than globally. We believe that our findings demonstrate how examining explainability through a computational complexity lens can help us develop a more rigorous grasp of the inherent interpretability of ML models.'
volume: 235
URL: https://proceedings.mlr.press/v235/bassan24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bassan24a/bassan24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bassan24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Shahaf
family: Bassan
- given: Guy
family: Amir
- given: Guy
family: Katz
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3133-3167
id: bassan24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3133
lastpage: 3167
published: 2024-07-08 00:00:00 +0000
- title: 'Differentially Private Domain Adaptation with Theoretical Guarantees'
abstract: 'In many applications, the labeled data at the learner’s disposal is subject to privacy constraints and is relatively limited. To derive a more accurate predictor for the target domain, it is often beneficial to leverage publicly available labeled data from an alternative domain, somewhat close to the target domain. This is the modern problem of supervised domain adaptation from a public source to a private target domain. We present two $(\epsilon, \delta)$-differentially private adaptation algorithms for supervised adaptation, for which we make use of a general optimization problem, recently shown to benefit from favorable theoretical learning guarantees. Our first algorithm is designed for regression with linear predictors and shown to solve a convex optimization problem. Our second algorithm is a more general solution for loss functions that may be non-convex but Lipschitz and smooth. While our main objective is a theoretical analysis, we also report the results of several experiments. We first show that the non-private versions of our algorithms match state-of-the-art performance in supervised adaptation and that for larger values of the target sample size or $\epsilon$, the performance of our private algorithms remains close to that of their non-private counterparts.'
volume: 235
URL: https://proceedings.mlr.press/v235/bassily24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bassily24a/bassily24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bassily24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Raef
family: Bassily
- given: Corinna
family: Cortes
- given: Anqi
family: Mao
- given: Mehryar
family: Mohri
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3168-3196
id: bassily24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3168
lastpage: 3196
published: 2024-07-08 00:00:00 +0000
- title: 'A Statistical Framework for Data-dependent Retrieval-Augmented Models'
abstract: 'Modern ML systems increasingly augment input instances with additional relevant information to enhance final prediction. Despite growing interest in such retrieval-augmented models, their fundamental properties and training are not well understood. We propose a statistical framework to study such models with two components: 1) a retriever to identify the relevant information out of a large corpus via a data-dependent metric; and 2) a predictor that consumes the input instances along with the retrieved information to make the final predictions. We present a principled method for end-to-end training of both components and draw connections with various training approaches in the literature. Furthermore, we establish excess risk bounds for retrieval-augmented models while delineating the contributions of both retriever and predictor towards the model performance.We validate the utility of our proposed training methods along with the key takeaways from our statistical analysis on open domain question answering task where retrieval augmentation is important.'
volume: 235
URL: https://proceedings.mlr.press/v235/basu24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/basu24a/basu24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-basu24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Soumya
family: Basu
- given: Ankit Singh
family: Rawat
- given: Manzil
family: Zaheer
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3197-3223
id: basu24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3197
lastpage: 3223
published: 2024-07-08 00:00:00 +0000
- title: 'On Mechanistic Knowledge Localization in Text-to-Image Generative Models'
abstract: 'Identifying layers within text-to-image models which control visual attributes can facilitate efficient model editing through closed-form updates. Recent work, leveraging causal tracing show that early Stable-Diffusion variants confine knowledge primarily to the first layer of the CLIP text-encoder, while it diffuses throughout the UNet. Extending this framework, we observe that for recent models (e.g., SD-XL, DeepFloyd), causal tracing fails in pinpointing localized knowledge, highlighting challenges in model editing. To address this issue, we introduce the concept of mechanistic localization in text-to-image models, where knowledge about various visual attributes (e.g., "style", "objects", "facts") can be mechanistically localized to a small fraction of layers in the UNet, thus facilitating efficient model editing. We localize knowledge using our method LocoGen which measures the direct effect of intermediate layers to output generation by performing interventions in the cross-attention layers of the UNet. We then employ LocoEdit, a fast closed-form editing method across popular open-source text-to-image models (including the latest SD-XL) and explore the possibilities of neuron-level model editing. Using mechanistic localization, our work offers a better view of successes and failures in localization-based text-to-image model editing.'
volume: 235
URL: https://proceedings.mlr.press/v235/basu24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/basu24b/basu24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-basu24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Samyadeep
family: Basu
- given: Keivan
family: Rezaei
- given: Priyatham
family: Kattakinda
- given: Vlad I
family: Morariu
- given: Nanxuan
family: Zhao
- given: Ryan A.
family: Rossi
- given: Varun
family: Manjunatha
- given: Soheil
family: Feizi
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3224-3265
id: basu24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3224
lastpage: 3265
published: 2024-07-08 00:00:00 +0000
- title: 'Monotone Individual Fairness'
abstract: 'We revisit the problem of online learning with individual fairness, where an online learner strives to maximize predictive accuracy while ensuring that similar individuals are treated similarly. We first extend the frameworks of Gillen et al. (2018); Bechavod et al. (2020), which rely on feedback from human auditors regarding fairness violations, to allow for auditing schemes that can aggregate feedback from any number of auditors, using a rich class we term monotone aggregation functions, for which we also prove a useful characterization. Using our generalized framework, we present an oracle-efficient algorithm guaranteeing a bound of $\mathcal{O}(T^\frac{3}{4})$ simultaneously for regret and number of fairness violations. We then study an online classification setting where label feedback is available for positively-predicted individuals only, and present an algorithm guaranteeing a bound of $\mathcal{O}(T^\frac{5}{6})$ simultaneously for regret and number of fairness violations. In both settings, our algorithms improve on the best known bounds for oracle-efficient algorithms. Furthermore, our algorithms offer significant improvements in computational efficiency, greatly reducing the number of required calls to an (offline) optimization oracle, as opposed to previous algorithms which required $T$ such calls every round.'
volume: 235
URL: https://proceedings.mlr.press/v235/bechavod24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bechavod24a/bechavod24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bechavod24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yahav
family: Bechavod
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3266-3283
id: bechavod24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3266
lastpage: 3283
published: 2024-07-08 00:00:00 +0000
- title: 'Graph Neural Networks Use Graphs When They Shouldn’t'
abstract: 'Predictions over graphs play a crucial role in various domains, including social networks and medicine. Graph Neural Networks (GNNs) have emerged as the dominant approach for learning on graph data. Although a graph-structure is provided as input to the GNN, in some cases the best solution can be obtained by ignoring it. While GNNs have the ability to ignore the graph-structure in such cases, it is not clear that they will. In this work, we show that GNNs actually tend to overfit the given graph-structure in the sense that they use it even when a better solution can be obtained by ignoring it. We analyze the implicit bias of gradient-descent learning of GNNs and prove that when the ground truth function does not use the graphs, GNNs are not guaranteed to learn a solution that ignores the graph, even with infinite data. We examine this phenomenon with respect to different graph distributions and find that regular graphs are more robust to this overfitting. We also prove that within the family of regular graphs, GNNs are guaranteed to extrapolate when learning with gradient descent. Finally, based on our empirical and theoretical findings, we demonstrate on real-data how regular graphs can be leveraged to reduce graph overfitting and enhance performance.'
volume: 235
URL: https://proceedings.mlr.press/v235/bechler-speicher24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bechler-speicher24a/bechler-speicher24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bechler-speicher24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Maya
family: Bechler-Speicher
- given: Ido
family: Amos
- given: Ran
family: Gilad-Bachrach
- given: Amir
family: Globerson
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3284-3304
id: bechler-speicher24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3284
lastpage: 3304
published: 2024-07-08 00:00:00 +0000
- title: 'Diffusion Tempering Improves Parameter Estimation with Probabilistic Integrators for Ordinary Differential Equations'
abstract: 'Ordinary differential equations (ODEs) are widely used to describe dynamical systems in science, but identifying parameters that explain experimental measurements is challenging. In particular, although ODEs are differentiable and would allow for gradient-based parameter optimization, the nonlinear dynamics of ODEs often lead to many local minima and extreme sensitivity to initial conditions. We therefore propose diffusion tempering, a novel regularization technique for probabilistic numerical methods which improves convergence of gradient-based parameter optimization in ODEs. By iteratively reducing a noise parameter of the probabilistic integrator, the proposed method converges more reliably to the true parameters. We demonstrate that our method is effective for dynamical systems of different complexity and show that it obtains reliable parameter estimates for a Hodgkin–Huxley model with a practically relevant number of parameters.'
volume: 235
URL: https://proceedings.mlr.press/v235/beck24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/beck24a/beck24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-beck24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jonas
family: Beck
- given: Nathanael
family: Bosch
- given: Michael
family: Deistler
- given: Kyra L.
family: Kadhim
- given: Jakob H.
family: Macke
- given: Philipp
family: Hennig
- given: Philipp
family: Berens
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3305-3326
id: beck24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3305
lastpage: 3326
published: 2024-07-08 00:00:00 +0000
- title: 'Standardized Interpretable Fairness Measures for Continuous Risk Scores'
abstract: 'We propose a standardized version of fairness measures for continuous scores with a reasonable interpretation based on the Wasserstein distance. Our measures are easily computable and well suited for quantifying and interpreting the strength of group disparities as well as for comparing biases across different models, datasets, or time points. We derive a link between the different families of existing fairness measures for scores and show that the proposed standardized fairness measures outperform ROC-based fairness measures because they are more explicit and can quantify significant biases that ROC-based fairness measures miss.'
volume: 235
URL: https://proceedings.mlr.press/v235/becker24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/becker24a/becker24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-becker24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ann-Kristin
family: Becker
- given: Oana
family: Dumitrasc
- given: Klaus
family: Broelemann
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3327-3346
id: becker24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3327
lastpage: 3346
published: 2024-07-08 00:00:00 +0000
- title: 'Unsupervised Representation Learning of Brain Activity via Bridging Voxel Activity and Functional Connectivity'
abstract: 'Effective brain representation learning is a key step toward the understanding of cognitive processes and diagnosis of neurological diseases/disorders. Existing studies have focused on either (1) voxel-level activity, where only a single weight relating the voxel activity to the task (i.e., aggregation of voxel activity over a time window) is considered, missing their temporal dynamics, or (2) functional connectivity of the brain in the level of region of interests, missing voxel-level activities. We bridge this gap and design BrainMixer, an unsupervised learning framework that effectively utilizes both functional connectivity and associated time series of voxels to learn voxel-level representation in an unsupervised manner. BrainMixer employs two simple yet effective MLP-based encoders to simultaneously learn the dynamics of voxel-level signals and their functional correlations. To encode voxel activity, BrainMixer fuses information across both time and voxel dimensions via a dynamic attention mechanism. To learn the structure of the functional connectivity, BrainMixer presents a temporal graph patching and encodes each patch by combining its nodes’ features via a new adaptive temporal pooling. Our experiments show that BrainMixer attains outstanding performance and outperforms 14 baselines in different downstream tasks and setups.'
volume: 235
URL: https://proceedings.mlr.press/v235/behrouz24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/behrouz24a/behrouz24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-behrouz24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ali
family: Behrouz
- given: Parsa
family: Delavari
- given: Farnoosh
family: Hashemi
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3347-3381
id: behrouz24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3347
lastpage: 3381
published: 2024-07-08 00:00:00 +0000
- title: 'Neural Networks Learn Statistics of Increasing Complexity'
abstract: 'The *distributional simplicity bias* (DSB) posits that neural networks learn low-order moments of the data distribution first, before moving on to higher-order correlations. In this work, we present compelling new evidence for the DSB by showing that networks automatically learn to perform well on maximum-entropy distributions whose low-order statistics match those of the training set early in training, then lose this ability later. We also extend the DSB to discrete domains by proving an equivalence between token $n$-gram frequencies and the moments of embedding vectors, and by finding empirical evidence for the bias in LLMs. Finally we use optimal transport methods to surgically edit the low-order statistics of one class to match those of another, and show that early-training networks treat the edited samples as if they were drawn from the target class. Code is available at https://github.com/EleutherAI/features-across-time.'
volume: 235
URL: https://proceedings.mlr.press/v235/belrose24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/belrose24a/belrose24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-belrose24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Nora
family: Belrose
- given: Quintin
family: Pope
- given: Lucia
family: Quirke
- given: Alex Troy
family: Mallen
- given: Xiaoli
family: Fern
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3382-3409
id: belrose24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3382
lastpage: 3409
published: 2024-07-08 00:00:00 +0000
- title: 'Accelerating Federated Learning with Quick Distributed Mean Estimation'
abstract: 'Distributed Mean Estimation (DME), in which $n$ clients communicate vectors to a parameter server that estimates their average, is a fundamental building block in communication-efficient federated learning. In this paper, we improve on previous DME techniques that achieve the optimal $O(1/n)$ Normalized Mean Squared Error (NMSE) guarantee by asymptotically improving the complexity for either encoding or decoding (or both). To achieve this, we formalize the problem in a novel way that allows us to use off-the-shelf mathematical solvers to design the quantization. Using various datasets and training tasks, we demonstrate how QUIC-FL achieves state of the art accuracy with faster encoding and decoding times compared to other DME methods.'
volume: 235
URL: https://proceedings.mlr.press/v235/ben-basat24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/ben-basat24a/ben-basat24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-ben-basat24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ran
family: Ben-Basat
- given: Shay
family: Vargaftik
- given: Amit
family: Portnoy
- given: Gil
family: Einziger
- given: Yaniv
family: Ben-Itzhak
- given: Michael
family: Mitzenmacher
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3410-3442
id: ben-basat24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3410
lastpage: 3442
published: 2024-07-08 00:00:00 +0000
- title: 'The Role of Learning Algorithms in Collective Action'
abstract: 'Collective action in machine learning is the study of the control that a coordinated group can have over machine learning algorithms. While previous research has concentrated on assessing the impact of collectives against Bayes (sub-)optimal classifiers, this perspective is limited in that it does not account for the choice of learning algorithm. Since classifiers seldom behave like Bayes classifiers and are influenced by the choice of learning algorithms along with their inherent biases, in this work we initiate the study of how the choice of the learning algorithm plays a role in the success of a collective in practical settings. Specifically, we focus on distributionally robust optimization (DRO), popular for improving a worst group error, and on the ubiquitous stochastic gradient descent (SGD), due to its inductive bias for "simpler" functions. Our empirical results, supported by a theoretical foundation, show that the effective size and success of the collective are highly dependent on properties of the learning algorithm. This highlights the necessity of taking the learning algorithm into account when studying the impact of collective action in machine learning.'
volume: 235
URL: https://proceedings.mlr.press/v235/ben-dov24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/ben-dov24a/ben-dov24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-ben-dov24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Omri
family: Ben-Dov
- given: Jake
family: Fawkes
- given: Samira
family: Samadi
- given: Amartya
family: Sanyal
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3443-3461
id: ben-dov24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3443
lastpage: 3461
published: 2024-07-08 00:00:00 +0000
- title: 'D-Flow: Differentiating through Flows for Controlled Generation'
abstract: 'Taming the generation outcome of state of the art Diffusion and Flow-Matching (FM) models without having to re-train a task-specific model unlocks a powerful tool for solving inverse problems, conditional generation, and controlled generation in general. In this work we introduce *D-Flow*, a simple framework for controlling the generation process by differentiating through the flow, optimizing for the source (noise) point. We motivate this framework by our key observation stating that for Diffusion/FM models trained with Gaussian probability paths, differentiating through the generation process projects gradient on the data manifold, implicitly injecting the prior into the optimization process. We validate our framework on linear and non-linear controlled generation problems including: image and audio inverse problems and conditional molecule generation reaching state of the art performance across all.'
volume: 235
URL: https://proceedings.mlr.press/v235/ben-hamu24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/ben-hamu24a/ben-hamu24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-ben-hamu24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Heli
family: Ben-Hamu
- given: Omri
family: Puny
- given: Itai
family: Gat
- given: Brian
family: Karrer
- given: Uriel
family: Singer
- given: Yaron
family: Lipman
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3462-3483
id: ben-hamu24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3462
lastpage: 3483
published: 2024-07-08 00:00:00 +0000
- title: 'Transitional Uncertainty with Layered Intermediate Predictions'
abstract: 'In this paper, we discuss feature engineering for single-pass uncertainty estimation. For accurate uncertainty estimates, neural networks must extract differences in the feature space that quantify uncertainty. This could be achieved by current single-pass approaches that maintain feature distances between data points as they traverse the network. While initial results are promising, maintaining feature distances within the network representations frequently inhibits information compression and opposes the learning objective. We study this effect theoretically and empirically to arrive at a simple conclusion: preserving feature distances in the output is beneficial when the preserved features contribute to learning the label distribution and act in opposition otherwise. We then propose Transitional Uncertainty with Layered Intermediate Predictions (TULIP) as a simple approach to address the shortcomings of current single-pass estimators. Specifically, we implement feature preservation by extracting features from intermediate representations before information is collapsed by subsequent layers. We refer to the underlying preservation mechanism as transitional feature preservation. We show that TULIP matches or outperforms current single-pass methods on standard benchmarks and in practical settings where these methods are less reliable (imbalances, complex architectures, medical modalities).'
volume: 235
URL: https://proceedings.mlr.press/v235/benkert24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/benkert24a/benkert24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-benkert24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ryan
family: Benkert
- given: Mohit
family: Prabhushankar
- given: Ghassan
family: Alregib
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3484-3505
id: benkert24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3484
lastpage: 3505
published: 2024-07-08 00:00:00 +0000
- title: 'Non-clairvoyant Scheduling with Partial Predictions'
abstract: 'The non-clairvoyant scheduling problem has gained new interest within learning-augmented algorithms, where the decision-maker is equipped with predictions without any quality guarantees. In practical settings, access to predictions may be reduced to specific instances, due to cost or data limitations. Our investigation focuses on scenarios where predictions for only $B$ job sizes out of $n$ are available to the algorithm. We first establish near-optimal lower bounds and algorithms in the case of perfect predictions. Subsequently, we present a learning-augmented algorithm satisfying the robustness, consistency, and smoothness criteria, and revealing a novel tradeoff between consistency and smoothness inherent in the scenario with a restricted number of predictions.'
volume: 235
URL: https://proceedings.mlr.press/v235/benomar24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/benomar24a/benomar24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-benomar24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ziyad
family: Benomar
- given: Vianney
family: Perchet
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3506-3538
id: benomar24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3506
lastpage: 3538
published: 2024-07-08 00:00:00 +0000
- title: 'Sequential Disentanglement by Extracting Static Information From A Single Sequence Element'
abstract: 'One of the fundamental representation learning tasks is unsupervised sequential disentanglement, where latent codes of inputs are decomposed to a single static factor and a sequence of dynamic factors. To extract this latent information, existing methods condition the static and dynamic codes on the entire input sequence. Unfortunately, these models often suffer from information leakage, i.e., the dynamic vectors encode both static and dynamic information, or vice versa, leading to a non-disentangled representation. Attempts to alleviate this problem via reducing the dynamic dimension and auxiliary loss terms gain only partial success. Instead, we propose a novel and simple architecture that mitigates information leakage by offering a simple and effective subtraction inductive bias while conditioning on a single sample. Remarkably, the resulting variational framework is simpler in terms of required loss terms, hyper-parameters, and data augmentation. We evaluate our method on multiple data-modality benchmarks including general time series, video, and audio, and we show beyond state-of-the-art results on generation and prediction tasks in comparison to several strong baselines.'
volume: 235
URL: https://proceedings.mlr.press/v235/berman24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/berman24a/berman24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-berman24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Nimrod
family: Berman
- given: Ilan
family: Naiman
- given: Idan
family: Arbiv
- given: Gal
family: Fadlon
- given: Omri
family: Azencot
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3539-3564
id: berman24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3539
lastpage: 3564
published: 2024-07-08 00:00:00 +0000
- title: 'CoLoRA: Continuous low-rank adaptation for reduced implicit neural modeling of parameterized partial differential equations'
abstract: 'This work introduces reduced models based on Continuous Low Rank Adaptation (CoLoRA) that pre-train neural networks for a given partial differential equation and then continuously adapt low-rank weights in time to rapidly predict the evolution of solution fields at new physics parameters and new initial conditions. The adaptation can be either purely data-driven or via an equation-driven variational approach that provides Galerkin-optimal approximations. Because CoLoRA approximates solution fields locally in time, the rank of the weights can be kept small, which means that only few training trajectories are required offline so that CoLoRA is well suited for data-scarce regimes. Predictions with CoLoRA are orders of magnitude faster than with classical methods and their accuracy and parameter efficiency is higher compared to other neural network approaches.'
volume: 235
URL: https://proceedings.mlr.press/v235/berman24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/berman24b/berman24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-berman24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jules
family: Berman
- given: Benjamin
family: Peherstorfer
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3565-3583
id: berman24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3565
lastpage: 3583
published: 2024-07-08 00:00:00 +0000
- title: 'By Tying Embeddings You Are Assuming the Distributional Hypothesis'
abstract: 'In this work, we analyze both theoretically and empirically the effect of tied input-output embeddings—a popular technique that reduces the model size while often improving training. Interestingly, we found that this technique is connected to Harris (1954)’s distributional hypothesis—often portrayed by the famous Firth (1957)’s quote “a word is characterized by the company it keeps”. Specifically, our findings indicate that words (or, more broadly, symbols) with similar semantics tend to be encoded in similar input embeddings, while words that appear in similar contexts are encoded in similar output embeddings (thus explaining the semantic space arising in input and output embedding of foundational language models). As a consequence of these findings, the tying of the input and output embeddings is encouraged only when the distributional hypothesis holds for the underlying data. These results also provide insight into the embeddings of foundation language models (which are known to be semantically organized). Further, we complement the theoretical findings with several experiments supporting the claims.'
volume: 235
URL: https://proceedings.mlr.press/v235/bertolotti24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bertolotti24a/bertolotti24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bertolotti24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Francesco
family: Bertolotti
- given: Walter
family: Cazzola
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3584-3610
id: bertolotti24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3584
lastpage: 3610
published: 2024-07-08 00:00:00 +0000
- title: 'Controlling Behavioral Diversity in Multi-Agent Reinforcement Learning'
abstract: 'The study of behavioral diversity in Multi-Agent Reinforcement Learning (MARL) is a nascent yet promising field. In this context, the present work deals with the question of how to control the diversity of a multi-agent system. With no existing approaches to control diversity to a set value, current solutions focus on blindly promoting it via intrinsic rewards or additional loss functions, effectively changing the learning objective and lacking a principled measure for it. To address this, we introduce Diversity Control (DiCo), a method able to control diversity to an exact value of a given metric by representing policies as the sum of a parameter-shared component and dynamically scaled per-agent components. By applying constraints directly to the policy architecture, DiCo leaves the learning objective unchanged, enabling its applicability to any actor-critic MARL algorithm. We theoretically prove that DiCo achieves the desired diversity, and we provide several experiments, both in cooperative and competitive tasks, that show how DiCo can be employed as a novel paradigm to increase performance and sample efficiency in MARL. Multimedia results are available on the paper’s website: https://sites.google.com/view/dico-marl'
volume: 235
URL: https://proceedings.mlr.press/v235/bettini24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bettini24a/bettini24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bettini24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Matteo
family: Bettini
- given: Ryan
family: Kortvelesy
- given: Amanda
family: Prorok
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3611-3636
id: bettini24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3611
lastpage: 3636
published: 2024-07-08 00:00:00 +0000
- title: 'Refining Minimax Regret for Unsupervised Environment Design'
abstract: 'In unsupervised environment design, reinforcement learning agents are trained on environment configurations (levels) generated by an adversary that maximises some objective. Regret is a commonly used objective that theoretically results in a minimax regret (MMR) policy with desirable robustness guarantees; in particular, the agent’s maximum regret is bounded. However, once the agent reaches this regret bound on all levels, the adversary will only sample levels where regret cannot be further reduced. Although there may be possible performance improvements to be made outside of these regret-maximising levels, learning stagnates. In this work, we introduce *Bayesian level-perfect MMR* (BLP), a refinement of the minimax regret objective that overcomes this limitation. We formally show that solving for this objective results in a subset of MMR policies, and that BLP policies act consistently with a Perfect Bayesian policy over all levels. We further introduce an algorithm, *ReMiDi*, that results in a BLP policy at convergence. We empirically demonstrate that training on levels from a minimax regret adversary causes learning to prematurely stagnate, but that ReMiDi continues learning.'
volume: 235
URL: https://proceedings.mlr.press/v235/beukman24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/beukman24a/beukman24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-beukman24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Michael
family: Beukman
- given: Samuel
family: Coward
- given: Michael
family: Matthews
- given: Mattie
family: Fellows
- given: Minqi
family: Jiang
- given: Michael D
family: Dennis
- given: Jakob Nicolaus
family: Foerster
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3637-3657
id: beukman24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3637
lastpage: 3657
published: 2024-07-08 00:00:00 +0000
- title: 'Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation'
abstract: 'To ensure that text generated by large language models (LLMs) is in an expected format, constrained decoding methods propose to enforce strict formal language constraints during generation. However, as we show in this work, not only do such methods often incur performance overhead during generation, but many of them also significantly impair task accuracy, if they do not correctly align the underlying LLM sub-word vocabularies with external constraints. To address this, we present a novel decoding algorithm, DOMINO, that can enforce constraints in a fully subword-aligned fashion, while leveraging pre-computation and speculative decoding to achieve virtually no overhead and in some cases even almost 2$\times$ speedup over unconstrained decoding – thereby outperforming existing approaches by a wide margin. We release DOMINO as open source at https://github.com/eth-sri/domino.'
volume: 235
URL: https://proceedings.mlr.press/v235/beurer-kellner24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/beurer-kellner24a/beurer-kellner24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-beurer-kellner24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Luca
family: Beurer-Kellner
- given: Marc
family: Fischer
- given: Martin
family: Vechev
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3658-3673
id: beurer-kellner24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3658
lastpage: 3673
published: 2024-07-08 00:00:00 +0000
- title: 'Prompt Sketching for Large Language Models'
abstract: 'Many recent prompting strategies for large language models (LLMs) query the model multiple times sequentially – first to produce intermediate results and then the final answer. However, using these methods, both decoder and model are unaware of potential follow-up prompts, leading to disconnected and undesirably wordy intermediate responses. In this work, we address this issue by proposing prompt sketching, a new prompting paradigm in which an LLM does not only respond by completing a prompt, but by predicting values for multiple variables in a template. This way, sketching grants users more control over the generation process, e.g., by providing a reasoning framework via intermediate instructions, leading to better overall results. The key idea enabling sketching with existing, autoregressive models is to adapt the decoding procedure to also score follow-up instructions during text generation, thus optimizing overall template likelihood in inference. Our experiments show that in a zero-shot setting, prompt sketching outperforms existing, sequential prompting schemes such as direct asking or chain-of-thought on 7 out of 8 LLM benchmarking tasks, including state tracking, arithmetic reasoning, and general question answering. To facilitate future use, we release a number of generic, yet effective sketches applicable to many tasks, and an open source library called dclib, powering our sketch-aware decoders as part of https://github.com/eth-sri/lmql.'
volume: 235
URL: https://proceedings.mlr.press/v235/beurer-kellner24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/beurer-kellner24b/beurer-kellner24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-beurer-kellner24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Luca
family: Beurer-Kellner
- given: Mark Niklas
family: Mueller
- given: Marc
family: Fischer
- given: Martin
family: Vechev
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3674-3706
id: beurer-kellner24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3674
lastpage: 3706
published: 2024-07-08 00:00:00 +0000
- title: 'Counterfactual Metarules for Local and Global Recourse'
abstract: 'We introduce **T-CREx**, a novel model-agnostic method for local and global counterfactual explanation (CE), which summarises recourse options for both individuals and groups in the form of generalised rules. It leverages tree-based surrogate models to learn the counterfactual rules, alongside *metarules* denoting their regimes of optimality, providing both a global analysis of model behaviour and diverse recourse options for users. Experiments indicate that **T-CREx** achieves superior aggregate performance over existing rule-based baselines on a range of CE desiderata, while being orders of magnitude faster to run.'
volume: 235
URL: https://proceedings.mlr.press/v235/bewley24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bewley24a/bewley24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bewley24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tom
family: Bewley
- given: Salim
family: I. Amoukou
- given: Saumitra
family: Mishra
- given: Daniele
family: Magazzeni
- given: Manuela
family: Veloso
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3707-3724
id: bewley24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3707
lastpage: 3724
published: 2024-07-08 00:00:00 +0000
- title: 'Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features'
abstract: 'The Frank-Wolfe (FW) method is a popular approach for solving optimization problems with structured constraints that arise in machine learning applications. In recent years, stochastic versions of FW have gained popularity, motivated by large datasets for which the computation of the full gradient is prohibitively expensive. In this paper, we present two new variants of the FW algorithms for stochastic finite-sum minimization. Our algorithms have the best convergence guarantees of existing stochastic FW approaches for both convex and non-convex objective functions. Our methods do not have the issue of permanently collecting large batches, which is common to many stochastic projection-free approaches. Moreover, our second approach does not require either large batches or full deterministic gradients, which is a typical weakness of many techniques for finite-sum problems. The faster theoretical rates of our approaches are confirmed experimentally.'
volume: 235
URL: https://proceedings.mlr.press/v235/beznosikov24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/beznosikov24a/beznosikov24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-beznosikov24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Aleksandr
family: Beznosikov
- given: David
family: Dobre
- given: Gauthier
family: Gidel
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3725-3750
id: beznosikov24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3725
lastpage: 3750
published: 2024-07-08 00:00:00 +0000
- title: 'Position: Scaling Simulation is Neither Necessary Nor Sufficient for In-the-Wild Robot Manipulation'
abstract: 'In this paper, we develop a structured critique of robotic simulations for real-world manipulation, by arguing that scaling simulators is neither necessary nor sufficient for making progress in general-purpose real-world robotic manipulation agents that are compliant with human preferences. With the ubiquity of robotic simulators, and recent efforts to scale them for diverse tasks, and at the same time the interest in generally capable real-world manipulation systems, we believe it is important to address the limitations of using simulation for real-world manipulation, so that as a community, we can focus our collective resources, energy, and time on approaches that have more principled odds of success. We further demonstrate the unique challenges that real-world manipulation presents, and show through examples and arguments why scaling simulation doesn’t get us closer to solving these challenges required for diverse real-world deployment.'
volume: 235
URL: https://proceedings.mlr.press/v235/bharadhwaj24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bharadhwaj24a/bharadhwaj24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bharadhwaj24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Homanga
family: Bharadhwaj
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3751-3762
id: bharadhwaj24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3751
lastpage: 3762
published: 2024-07-08 00:00:00 +0000
- title: 'Dynamic Facility Location in High Dimensional Euclidean Spaces'
abstract: 'We study the facility location problem in the dynamic setting, where the goal is to efficiently process an intermixed sequence of point insertions and deletions while maintaining a high quality and stable solution. Although the problem has been studied in the context of general metrics and low-dimensional spaces, much remains unknown concerning dynamic facility location in high dimensional spaces. In this work, we present the first fully dynamic algorithm for facility location in high-dimensional spaces $\mathbb{R}^{d}$. For any $c \geq 1$, our algorithm achieves $O(c)$-approximation, supports point updates in $\tilde{O}(\mathrm{poly}(d)n^{1/c + o(1)})$ amortized time and incurs $O(1)$ amortized recourse. More generally, our result shows that despite the linear-time lower bound on the update time for general metrics, it is possible to achieve sub-linear update times for metric spaces that admit dynamic nearest neighbour oracles. Experiments on real datasets confirm that our algorithm achieves high-quality solutions with low running time, and incurs minimal recourse.'
volume: 235
URL: https://proceedings.mlr.press/v235/bhattacharya24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bhattacharya24a/bhattacharya24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bhattacharya24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sayan
family: Bhattacharya
- given: Gramoz
family: Goranci
- given: Shaofeng H.-C.
family: Jiang
- given: Yi
family: Qian
- given: Yubo
family: Zhang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3763-3775
id: bhattacharya24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3763
lastpage: 3775
published: 2024-07-08 00:00:00 +0000
- title: 'Total Variation Distance Meets Probabilistic Inference'
abstract: 'In this paper, we establish a novel connection between total variation (TV) distance estimation and probabilistic inference. In particular, we present an efficient, structure-preserving reduction from relative approximation of TV distance to probabilistic inference over directed graphical models. This reduction leads to a fully polynomial randomized approximation scheme (FPRAS) for estimating TV distances between same-structure distributions over any class of Bayes nets for which there is an efficient probabilistic inference algorithm. In particular, it leads to an FPRAS for estimating TV distances between distributions that are defined over a common Bayes net of small treewidth. Prior to this work, such approximation schemes only existed for estimating TV distances between product distributions. Our approach employs a new notion of *partial* couplings of high-dimensional distributions, which might be of independent interest.'
volume: 235
URL: https://proceedings.mlr.press/v235/bhattacharyya24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bhattacharyya24a/bhattacharyya24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bhattacharyya24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Arnab
family: Bhattacharyya
- given: Sutanu
family: Gayen
- given: Kuldeep S.
family: Meel
- given: Dimitrios
family: Myrisiotis
- given: A.
family: Pavan
- given: N. V.
family: Vinodchandran
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3776-3794
id: bhattacharyya24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3776
lastpage: 3794
published: 2024-07-08 00:00:00 +0000
- title: 'Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling'
abstract: 'Reasoning from sequences of raw sensory data is a ubiquitous problem across fields ranging from medical devices to robotics. These problems often involve using long sequences of raw sensor data (e.g. magnetometers, piezoresistors) to predict sequences of desirable physical quantities (e.g. force, inertial measurements). While classical approaches are powerful for locally-linear prediction problems, they often fall short when using real-world sensors. These sensors are typically non-linear, are affected by extraneous variables (e.g. vibration), and exhibit data-dependent drift. For many problems, the prediction task is exacerbated by small labeled datasets since obtaining ground-truth labels requires expensive equipment. In this work, we present Hierarchical State-Space models (HiSS), a conceptually simple, new technique for continuous sequential prediction. HiSS stacks structured state-space models on top of each other to create a temporal hierarchy. Across six real-world sensor datasets, from tactile-based state prediction to accelerometer-based inertial measurement, HiSS outperforms state-of-the-art sequence models such as causal Transformers, LSTMs, S4, and Mamba by at least 23% on MSE. Our experiments further indicate that HiSS demonstrates efficient scaling to smaller datasets and is compatible with existing data-filtering techniques. Code, datasets and videos can be found on https://hiss-csp.github.io.'
volume: 235
URL: https://proceedings.mlr.press/v235/bhirangi24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bhirangi24a/bhirangi24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bhirangi24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Raunaq
family: Bhirangi
- given: Chenyu
family: Wang
- given: Venkatesh
family: Pattabiraman
- given: Carmel
family: Majidi
- given: Abhinav
family: Gupta
- given: Tess
family: Hellebrekers
- given: Lerrel
family: Pinto
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3795-3816
id: bhirangi24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3795
lastpage: 3816
published: 2024-07-08 00:00:00 +0000
- title: 'Why do Variational Autoencoders Really Promote Disentanglement?'
abstract: 'Despite not being designed for this purpose, the use of variational autoencoders (VAEs) has proven remarkably effective for disentangled representation learning (DRL). Recent research attributes this success to certain characteristics of the loss function that prevent latent space rotation, or hypothesize about the orthogonality properties of the decoder by drawing parallels with principal component analysis (PCA). This hypothesis, however, has only been tested experimentally for linear VAEs, and the theoretical justification still remains an open problem. Moreover, since real-world VAEs are often inherently non-linear due to the use of neural architectures, understanding DRL capabilities of real-world VAEs remains a critical task. Our work takes a step towards understanding disentanglement in real-world VAEs to theoretically establish how the orthogonality properties of the decoder promotes disentanglement in practical applications. Complementary to our theoretical contributions, our experimental results corroborate our analysis. Code is available at https://github.com/criticalml-uw/Disentanglement-in-VAE.'
volume: 235
URL: https://proceedings.mlr.press/v235/bhowal24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bhowal24a/bhowal24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bhowal24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Pratik
family: Bhowal
- given: Achint
family: Soni
- given: Sirisha
family: Rambhatla
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3817-3849
id: bhowal24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3817
lastpage: 3849
published: 2024-07-08 00:00:00 +0000
- title: 'Best of Both Worlds Guarantees for Smoothed Online Quadratic Optimization'
abstract: 'We study the smoothed online quadratic optimization (SOQO) problem where, at each round $t$, a player plays an action $x_t$ in response to a quadratic hitting cost and an additional squared $\ell_2$-norm cost for switching actions. This problem class has strong connections to a wide range of application domains including smart grid management, adaptive control, and data center management, where switching-efficient algorithms are highly sought after. We study the SOQO problem in both adversarial and stochastic settings, and in this process, perform the first stochastic analysis of this class of problems. We provide the online optimal algorithm when the minimizers of the hitting cost function evolve as a general stochastic process, which, for the case of martingale process, takes the form of a *distribution-agnostic dynamic interpolation algorithm* that we call Lazy Adaptive Interpolation (LAI). Next, we present the stochastic-adversarial trade-off by proving an $\Omega(T)$ expected regret for the adversarial optimal algorithm in the literature (ROBD) with respect to LAI and, a sub-optimal competitive ratio for LAI in the adversarial setting. Finally, we present a best-of-both-worlds algorithm that obtains a robust adversarial performance while simultaneously achieving a near-optimal stochastic performance.'
volume: 235
URL: https://proceedings.mlr.press/v235/bhuyan24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bhuyan24a/bhuyan24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bhuyan24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Neelkamal
family: Bhuyan
- given: Debankur
family: Mukherjee
- given: Adam
family: Wierman
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3850-3888
id: bhuyan24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3850
lastpage: 3888
published: 2024-07-08 00:00:00 +0000
- title: 'Multi-Patch Prediction: Adapting Language Models for Time Series Representation Learning'
abstract: 'In this study, we present $\text{aL\small{LM}4T\small{S}}$, an innovative framework that adapts Large Language Models (LLMs) for time-series representation learning. Central to our approach is that we reconceive time-series forecasting as a self-supervised, multi-patch prediction task, which, compared to traditional mask-and-reconstruction methods, captures temporal dynamics in patch representations more effectively. Our strategy encompasses two-stage training: (i). a causal continual pre-training phase on various time-series datasets, anchored on next patch prediction, effectively syncing LLM capabilities with the intricacies of time-series data; (ii). fine-tuning for multi-patch prediction in the targeted time-series context. A distinctive element of our framework is the patch-wise decoding layer, which departs from previous methods reliant on sequence-level decoding. Such a design directly transposes individual patches into temporal sequences, thereby significantly bolstering the model’s proficiency in mastering temporal patch-based representations. $\text{aL\small{LM}4T\small{S}}$ demonstrates superior performance in several downstream tasks, proving its effectiveness in deriving temporal representations with enhanced transferability and marking a pivotal advancement in the adaptation of LLMs for time-series analysis.'
volume: 235
URL: https://proceedings.mlr.press/v235/bian24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bian24a/bian24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bian24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yuxuan
family: Bian
- given: Xuan
family: Ju
- given: Jiangtong
family: Li
- given: Zhijian
family: Xu
- given: Dawei
family: Cheng
- given: Qiang
family: Xu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3889-3912
id: bian24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3889
lastpage: 3912
published: 2024-07-08 00:00:00 +0000
- title: 'Naive Bayes Classifiers over Missing Data: Decision and Poisoning'
abstract: 'We study the certifiable robustness of ML classifiers on dirty datasets that could contain missing values. A test point is certifiably robust for an ML classifier if the classifier returns the same prediction for that test point, regardless of which cleaned version (among exponentially many) of the dirty dataset the classifier is trained on. In this paper, we show theoretically that for Naive Bayes Classifiers (NBC) over dirty datasets with missing values: (i) there exists an efficient polynomial time algorithm to decide whether multiple input test points are all certifiably robust over a dirty dataset; and (ii) the data poisoning attack, which aims to make all input test points certifiably non-robust by inserting missing cells to the clean dataset, is in polynomial time for single test points but NP-complete for multiple test points. Extensive experiments demonstrate that our algorithms are efficient and outperform existing baselines.'
volume: 235
URL: https://proceedings.mlr.press/v235/bian24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bian24b/bian24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bian24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Song
family: Bian
- given: Xiating
family: Ouyang
- given: Zhiwei
family: Fan
- given: Paraschos
family: Koutris
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3913-3934
id: bian24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3913
lastpage: 3934
published: 2024-07-08 00:00:00 +0000
- title: 'How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis'
abstract: 'Negotiation is the basis of social interactions; humans negotiate everything from the price of cars to how to share common resources. With rapidly growing interest in using large language models (LLMs) to act as agents on behalf of human users, such LLM agents would also need to be able to negotiate. In this paper, we study how well LLMs can negotiate with each other. We develop NegotiationArena: a flexible framework for evaluating and probing the negotiation abilities of LLM agents. We implemented three types of scenarios in NegotiationArena to assess LLM’s behaviors in allocating shared resources (ultimatum games), aggregate resources (trading games) and buy/sell goods (price negotiations). Each scenario allows for multiple turns of flexible dialogues between LLM agents to allow for more complex negotiations. Interestingly, LLM agents can significantly boost their negotiation outcomes by employing certain behavioral tactics. For example, by pretending to be desolate and desperate, LLMs can improve their payoffs by 20% when negotiating against the standard GPT-4. We also quantify irrational negotiation behaviors exhibited by the LLM agents, many of which also appear in humans. Together, NegotiationArena offers a new environment to investigate LLM interactions, enabling new insights into LLM’s theory of mind, irrationality, and reasoning abilities'
volume: 235
URL: https://proceedings.mlr.press/v235/bianchi24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bianchi24a/bianchi24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bianchi24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Federico
family: Bianchi
- given: Patrick John
family: Chia
- given: Mert
family: Yuksekgonul
- given: Jacopo
family: Tagliabue
- given: Dan
family: Jurafsky
- given: James
family: Zou
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3935-3951
id: bianchi24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3935
lastpage: 3951
published: 2024-07-08 00:00:00 +0000
- title: 'Scalable Safe Policy Improvement for Factored Multi-Agent MDPs'
abstract: 'In this work, we focus on safe policy improvement in multi-agent domains where current state-of-the-art methods cannot be effectively applied because of large state and action spaces. We consider recent results using Monte Carlo Tree Search for Safe Policy Improvement with Baseline Bootstrapping and propose a novel algorithm that scales this approach to multi-agent domains, exploiting the factorization of the transition model and value function. Given a centralized behavior policy and a dataset of trajectories, our algorithm generates an improved policy by selecting joint actions using a novel extension of Max-Plus (or Variable Elimination) that constrains local actions to guarantee safety criteria. An empirical evaluation on multi-agent SysAdmin and multi-UAV Delivery shows that the approach scales to very large domains where state-of-the-art methods cannot work.'
volume: 235
URL: https://proceedings.mlr.press/v235/bianchi24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bianchi24b/bianchi24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bianchi24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Federico
family: Bianchi
- given: Edoardo
family: Zorzi
- given: Alberto
family: Castellini
- given: Thiago D.
family: Simão
- given: Matthijs T. J.
family: Spaan
- given: Alessandro
family: Farinelli
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3952-3973
id: bianchi24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3952
lastpage: 3973
published: 2024-07-08 00:00:00 +0000
- title: 'Improving fine-grained understanding in image-text pre-training'
abstract: 'We introduce SPARse fine-grained Contrastive alignment (SPARC), a simple method for pretraining more fine-grained multimodal representations from image-text pairs. Given that multiple image patches often correspond to single words, we propose to learn a grouping of image patches for every token in the caption. To achieve this, we use a sparse similarity metric between image patches and language tokens and compute for each token a language-grouped vision embedding as the weighted average of patches. The token and language-grouped vision embeddings are then contrasted through a fine-grained sequence-wise loss that only depends on individual samples and does not require other batch samples as negatives, i.e., more detailed information is encoded in a computationally inexpensive way. SPARC combines this fine-grained loss with a contrastive loss between global image and text embeddings to learn representations that simultaneously encode global and local information. We thoroughly evaluate SPARC and show improved performance over competing approaches both on image-level tasks relying on coarse-grained information, e.g. classification, as well as region-level tasks relying on fine-grained information, e.g., retrieval, object detection, segmentation while also improving model faithfulness and captioning in foundational vision-language models.'
volume: 235
URL: https://proceedings.mlr.press/v235/bica24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bica24a/bica24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bica24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ioana
family: Bica
- given: Anastasija
family: Ilic
- given: Matthias
family: Bauer
- given: Goker
family: Erdogan
- given: Matko
family: Bošnjak
- given: Christos
family: Kaplanis
- given: Alexey A.
family: Gritsenko
- given: Matthias
family: Minderer
- given: Charles
family: Blundell
- given: Razvan
family: Pascanu
- given: Jovana
family: Mitrovic
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3974-3995
id: bica24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3974
lastpage: 3995
published: 2024-07-08 00:00:00 +0000
- title: 'Position: Explain to Question not to Justify'
abstract: 'Explainable Artificial Intelligence (XAI) is a young but very promising field of research. Unfortunately, the progress in this field is currently slowed down by divergent and incompatible goals. We separate various threads tangled within the area of XAI into two complementary cultures of human/value-oriented explanations (BLUE XAI) and model/validation-oriented explanations (RED XAI). This position paper argues that the area of RED XAI is currently under-explored, i.e., more methods for explainability are desperately needed to question models (e.g., extract knowledge from well-performing models as well as spotting and fixing bugs in faulty models), and the area of RED XAI hides great opportunities and potential for important research necessary to ensure the safety of AI systems. We conclude this paper by presenting promising challenges in this area.'
volume: 235
URL: https://proceedings.mlr.press/v235/biecek24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/biecek24a/biecek24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-biecek24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Przemyslaw
family: Biecek
- given: Wojciech
family: Samek
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 3996-4006
id: biecek24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 3996
lastpage: 4006
published: 2024-07-08 00:00:00 +0000
- title: 'ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections'
abstract: 'Parameter-efficient finetuning (PEFT) has become ubiquitous to adapt foundation models to downstream task requirements while retaining their generalization ability. However, the amount of additionally introduced parameters and compute for successful adaptation and hyperparameter searches can explode quickly, especially when deployed at scale to serve numerous individual requests. To ensure effective, parameter-efficient, and hyperparameter-robust adaptation, we propose the *ETHER* transformation family, which performs Efficient fineTuning via HypErplane Reflections. By design, *ETHER* transformations require *a minimal number of parameters*, are *less likely to deteriorate model performance*, and exhibit *robustness to hyperparameter and learning rate choices*. In particular, we introduce *ETHER* and its relaxation *ETHER+*, which match or outperform existing PEFT methods with significantly fewer parameters ($\sim$$10$-$100$ times lower than LoRA or OFT) across multiple image synthesis and natural language tasks without *exhaustive hyperparameter tuning*. Finally, we investigate the recent emphasis on Hyperspherical Energy retention for adaptation and raise questions on its practical utility. The code is available at https://github.com/mwbini/ether.'
volume: 235
URL: https://proceedings.mlr.press/v235/bini24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bini24a/bini24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bini24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Massimo
family: Bini
- given: Karsten
family: Roth
- given: Zeynep
family: Akata
- given: Anna
family: Khoreva
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4007-4026
id: bini24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4007
lastpage: 4026
published: 2024-07-08 00:00:00 +0000
- title: 'Incorporating Information into Shapley Values: Reweighting via a Maximum Entropy Approach'
abstract: 'Both the marginal contributions needed for the computation of Shapley values and the graph produced by Pearl-Verma theorem rely on the choice of an ordering of the variables. For Shapley values, the marginal contributions are averaged over all orderings, while in causal inference methods, the typical approach is to select orderings producing a graph with a minimal number of edges. We reconcile both approaches by reinterpreting them from a maximum entropy perspective. Namely, Shapley values assume no prior knowledge about the orderings and treat them as equally likely, while causal inference approaches apply Occam’s razor and consider only orderings producing the simplest explanatory graphs. We find that the blind application of Occam’s razor to Shapley values does not produce fully satisfactory explanations. Hence, we propose two variations of Shapley values based on entropy maximization to appropriately incorporate prior information about the model. Hence, we propose a variation of Shapley values based on entropy maximization to appropriately incorporate prior information about the model.'
volume: 235
URL: https://proceedings.mlr.press/v235/biparva24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/biparva24a/biparva24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-biparva24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Darya
family: Biparva
- given: Donatello
family: Materassi
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4027-4045
id: biparva24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4027
lastpage: 4045
published: 2024-07-08 00:00:00 +0000
- title: 'Graph2Tac: Online Representation Learning of Formal Math Concepts'
abstract: 'In proof assistants, the physical proximity between two formal mathematical concepts is a strong predictor of their mutual relevance. Furthermore, lemmas with close proximity regularly exhibit similar proof structures. We show that this *locality* property can be exploited through online learning techniques to obtain solving agents that far surpass offline learners when asked to prove theorems in an unseen mathematical setting. We extensively benchmark two such online solvers implemented in the Tactician platform for the Coq proof assistant: First, Tactician’s online $k$-nearest neighbor solver, which can learn from recent proofs, shows a $1.72\times$ improvement in theorems proved over an offline equivalent. Second, we introduce a graph neural network, Graph2Tac, with a novel approach to build hierarchical representations for new definitions. Graph2Tac’s online definition task realizes a $1.5\times$ improvement in theorems solved over an offline baseline. The $k$-NN and Graph2Tac solvers rely on orthogonal online data, making them highly complementary. Their combination improves $1.27\times$ over their individual performances. Both solvers outperform all other general purpose provers for Coq, including CoqHammer, Proverbot9001, and a transformer baseline by at least $1.48\times$ and are available for practical use by end-users.'
volume: 235
URL: https://proceedings.mlr.press/v235/blaauwbroek24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/blaauwbroek24a/blaauwbroek24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-blaauwbroek24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lasse
family: Blaauwbroek
- given: Mirek
family: Olšák
- given: Jason
family: Rute
- given: Fidel Ivan
family: Schaposnik Massolo
- given: Jelle
family: Piepenbrock
- given: Vasily
family: Pestun
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4046-4076
id: blaauwbroek24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4046
lastpage: 4076
published: 2024-07-08 00:00:00 +0000
- title: 'Biharmonic Distance of Graphs and its Higher-Order Variants: Theoretical Properties with Applications to Centrality and Clustering'
abstract: 'Effective resistance is a distance between vertices of a graph that is both theoretically interesting and useful in applications. We study a variant of effective resistance called the biharmonic distance. While the effective resistance measures how well-connected two vertices are, we prove several theoretical results supporting the idea that the biharmonic distance measures how important an edge is to the global topology of the graph. Our theoretical results connect the biharmonic distance to well-known measures of connectivity of a graph like its total resistance and sparsity. Based on these results, we introduce two clustering algorithms using the biharmonic distance. Finally, we introduce a further generalization of the biharmonic distance that we call the $k$-harmonic distance. We empirically study the utility of biharmonic and $k$-harmonic distance for edge centrality and graph clustering.'
volume: 235
URL: https://proceedings.mlr.press/v235/black24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/black24a/black24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-black24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mitchell
family: Black
- given: Lucy
family: Lin
- given: Weng-Keen
family: Wong
- given: Amir
family: Nayyeri
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4077-4102
id: black24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4077
lastpage: 4102
published: 2024-07-08 00:00:00 +0000
- title: 'Comparing Graph Transformers via Positional Encodings'
abstract: 'The distinguishing power of graph transformers is tied to the choice of *positional encoding*: features used to augment the base transformer with information about the graph. There are two primary types of positional encoding: *absolute positional encodings (APEs)* and *relative positional encodings (RPEs)*. APEs assign features to each node and are given as input to the transformer. RPEs instead assign a feature to each *pair of nodes*, e.g., shortest-path distance, and are used to augment the attention block. A priori, it is unclear which method is better for maximizing the power of the resulting graph transformer. In this paper, we aim to understand the relationship between these different types of positional encodings. Interestingly, we show that graph transformers using APEs and RPEs are equivalent in their ability to distinguish non-isomorphic graphs. In particular, we demonstrate how to interchange APEs and RPEs while maintaining their distinguishing power in terms of graph transformers. However, in the case of graphs with node features, we show that RPEs may have an advantage over APEs. Based on our theoretical results, we provide a study of different APEs and RPEs—including the shortest-path and resistance distance and the recently introduced stable and expressive positional encoding (SPE)—and compare their distinguishing power in terms of transformers. We believe our work will help navigate the vast number of positional encoding choices and provide guidance on the future design of positional encodings for graph transformers.'
volume: 235
URL: https://proceedings.mlr.press/v235/black24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/black24b/black24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-black24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mitchell
family: Black
- given: Zhengchao
family: Wan
- given: Gal
family: Mishne
- given: Amir
family: Nayyeri
- given: Yusu
family: Wang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4103-4139
id: black24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4103
lastpage: 4139
published: 2024-07-08 00:00:00 +0000
- title: 'Stability Evaluation through Distributional Perturbation Analysis'
abstract: 'The performance of learning models often deteriorates when deployed in out-of-sample environments. To ensure reliable deployment, we propose a stability evaluation criterion based on distributional perturbations. Conceptually, our stability evaluation criterion is defined as the minimal perturbation required on our observed dataset to induce a prescribed deterioration in risk evaluation. In this paper, we utilize the optimal transport (OT) discrepancy with moment constraints on the (sample, density) space to quantify this perturbation. Therefore, our stability evaluation criterion can address both data corruptions and sub-population shifts—the two most common types of distribution shifts in real-world scenarios. To further realize practical benefits, we present a series of tractable convex formulations and computational methods tailored to different classes of loss functions. The key technical tool to achieve this is the strong duality theorem provided in this paper. Empirically, we validate the practical utility of our stability evaluation criterion across a host of real-world applications. These empirical studies showcase the criterion’s ability not only to compare the stability of different learning models and features but also to provide valuable guidelines and strategies to further improve models.'
volume: 235
URL: https://proceedings.mlr.press/v235/blanchet24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/blanchet24a/blanchet24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-blanchet24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jose
family: Blanchet
- given: Peng
family: Cui
- given: Jiajin
family: Li
- given: Jiashuo
family: Liu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4140-4159
id: blanchet24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4140
lastpage: 4159
published: 2024-07-08 00:00:00 +0000
- title: 'Dynamic Survival Analysis with Controlled Latent States'
abstract: 'We consider the task of learning individual-specific intensities of counting processes from a set of static variables and irregularly sampled time series. We introduce a novel modelization approach in which the intensity is the solution to a controlled differential equation. We first design a neural estimator by building on neural controlled differential equations. In a second time, we show that our model can be linearized in the signature space under sufficient regularity conditions, yielding a signature-based estimator which we call CoxSig. We provide theoretical learning guarantees for both estimators, before showcasing the performance of our models on a vast array of simulated and real-world datasets from finance, predictive maintenance and food supply chain management.'
volume: 235
URL: https://proceedings.mlr.press/v235/bleistein24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bleistein24a/bleistein24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bleistein24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Linus
family: Bleistein
- given: Van Tuan
family: Nguyen
- given: Adeline
family: Fermanian
- given: Agathe
family: Guilloux
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4160-4204
id: bleistein24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4160
lastpage: 4204
published: 2024-07-08 00:00:00 +0000
- title: 'Beyond ELBOs: A Large-Scale Evaluation of Variational Methods for Sampling'
abstract: 'Monte Carlo methods, Variational Inference, and their combinations play a pivotal role in sampling from intractable probability distributions. However, current studies lack a unified evaluation framework, relying on disparate performance measures and limited method comparisons across diverse tasks, complicating the assessment of progress and hindering the decision-making of practitioners. In response to these challenges, our work introduces a benchmark that evaluates sampling methods using a standardized task suite and a broad range of performance criteria. Moreover, we study existing metrics for quantifying mode collapse and introduce novel metrics for this purpose. Our findings provide insights into strengths and weaknesses of existing sampling methods, serving as a valuable reference for future developments.'
volume: 235
URL: https://proceedings.mlr.press/v235/blessing24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/blessing24a/blessing24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-blessing24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Denis
family: Blessing
- given: Xiaogang
family: Jia
- given: Johannes
family: Esslinger
- given: Francisco
family: Vargas
- given: Gerhard
family: Neumann
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4205-4229
id: blessing24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4205
lastpage: 4229
published: 2024-07-08 00:00:00 +0000
- title: 'Shifted Interpolation for Differential Privacy'
abstract: 'Noisy gradient descent and its variants are the predominant algorithms for differentially private machine learning. It is a fundamental question to quantify their privacy leakage, yet tight characterizations remain open even in the foundational setting of convex losses. This paper improves over previous analyses by establishing (and refining) the “privacy amplification by iteration” phenomenon in the unifying framework of $f$-differential privacy—which tightly captures all aspects of the privacy loss and immediately implies tighter privacy accounting in other notions of differential privacy, e.g., $(\varepsilon,\delta)$-DP and Rényi DP. Our key technical insight is the construction of *shifted interpolated processes* that unravel the popular shifted-divergences argument, enabling generalizations beyond divergence-based relaxations of DP. Notably, this leads to the first *exact* privacy analysis in the foundational setting of strongly convex optimization. Our techniques extend to many settings: convex/strongly convex, constrained/unconstrained, full/cyclic/stochastic batches, and all combinations thereof. As an immediate corollary, we recover the $f$-DP characterization of the exponential mechanism for strongly convex optimization in Gopi et al. (2022), and moreover extend this result to more general settings.'
volume: 235
URL: https://proceedings.mlr.press/v235/bok24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bok24a/bok24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bok24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jinho
family: Bok
- given: Weijie J
family: Su
- given: Jason
family: Altschuler
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4230-4266
id: bok24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4230
lastpage: 4266
published: 2024-07-08 00:00:00 +0000
- title: 'How Spurious Features are Memorized: Precise Analysis for Random and NTK Features'
abstract: 'Deep learning models are known to overfit and memorize spurious features in the training dataset. While numerous empirical studies have aimed at understanding this phenomenon, a rigorous theoretical framework to quantify it is still missing. In this paper, we consider spurious features that are uncorrelated with the learning task, and we provide a precise characterization of how they are memorized via two separate terms: *(i)* the *stability* of the model with respect to individual training samples, and *(ii)* the *feature alignment* between the spurious pattern and the full sample. While the first term is well established in learning theory and it is connected to the generalization error in classical work, the second one is, to the best of our knowledge, novel. Our key technical result gives a precise characterization of the feature alignment for the two prototypical settings of random features (RF) and neural tangent kernel (NTK) regression. We prove that the memorization of spurious features weakens as the generalization capability increases and, through the analysis of the feature alignment, we unveil the role of the model and of its activation function. Numerical experiments show the predictive power of our theory on standard datasets (MNIST, CIFAR-10).'
volume: 235
URL: https://proceedings.mlr.press/v235/bombari24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bombari24a/bombari24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bombari24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Simone
family: Bombari
- given: Marco
family: Mondelli
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4267-4299
id: bombari24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4267
lastpage: 4299
published: 2024-07-08 00:00:00 +0000
- title: 'Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features'
abstract: 'Understanding the reasons behind the exceptional success of transformers requires a better analysis of why attention layers are suitable for NLP tasks. In particular, such tasks require predictive models to capture contextual meaning which often depends on one or few words, even if the sentence is long. Our work studies this key property, dubbed *word sensitivity* (WS), in the prototypical setting of random features. We show that attention layers enjoy high WS, namely, there exists a vector in the space of embeddings that largely perturbs the random attention features map. The argument critically exploits the role of the softmax in the attention layer, highlighting its benefit compared to other activations (e.g., ReLU). In contrast, the WS of standard random features is of order $1/\sqrt{n}$, $n$ being the number of words in the textual sample, and thus it decays with the length of the context. We then translate these results on the word sensitivity into generalization bounds: due to their low WS, random features provably cannot learn to distinguish between two sentences that differ only in a single word; in contrast, due to their high WS, random attention features have higher generalization capabilities. We validate our theoretical results with experimental evidence over the BERT-Base word embeddings of the imdb review dataset.'
volume: 235
URL: https://proceedings.mlr.press/v235/bombari24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bombari24b/bombari24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bombari24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Simone
family: Bombari
- given: Marco
family: Mondelli
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4300-4328
id: bombari24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4300
lastpage: 4328
published: 2024-07-08 00:00:00 +0000
- title: 'Position: Machine Learning-powered Assessments of the EU Digital Services Act Aid Quantify Policy Impacts on Online Harms'
abstract: 'While machine learning shows promise in automated knowledge generation, current techniques such as large language models and micro-targeted influence operations can be exploited for harmful purposes like the proliferation of disinformation. The European Union’s Digital Services Act (DSA) is an exemplary policy response addressing these harms generated by online platforms. In this regard, it necessitates a comprehensive evaluation of its impact on curbing the harmful downstream effects of these opaque practices. Despite their harmful applications, we argue that machine learning techniques offer immense, yet under-exploited, potential for unraveling the impacts of regulations like the DSA. Following an analysis that reveals possible limitations in the DSA’s provisions, we call for resolute efforts to address methodological barriers around appropriate data access, isolating marginal regulatory effects, and facilitating generalization across different contexts. Given the identified advantages of data-driven approaches to regulatory delivery, we advocate for machine learning research to help quantify the policy impacts on online harms.'
volume: 235
URL: https://proceedings.mlr.press/v235/bonel24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bonel24a/bonel24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bonel24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Eleonora
family: Bonel
- given: Luca
family: Nannini
- given: Davide
family: Bassi
- given: Michele Joshua
family: Maggini
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4329-4344
id: bonel24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4329
lastpage: 4344
published: 2024-07-08 00:00:00 +0000
- title: 'A Dynamical Model of Neural Scaling Laws'
abstract: 'On a variety of tasks, the performance of neural networks predictably improves with training time, dataset size and model size across many orders of magnitude. This phenomenon is known as a neural scaling law. Of fundamental importance is the compute-optimal scaling law, which reports the performance as a function of units of compute when choosing model sizes optimally. We analyze a random feature model trained with gradient descent as a solvable model of network training and generalization. This reproduces many observations about neural scaling laws. First, our model makes a prediction about why the scaling of performance with training time and with model size have different power law exponents. Consequently, the theory predicts an asymmetric compute-optimal scaling rule where the number of training steps are increased faster than model parameters, consistent with recent empirical observations. Second, it has been observed that early in training, networks converge to their infinite-width dynamics at a rate $1/\text{width}$ but at late time exhibit a rate $\text{width}^{-c}$, where $c$ depends on the structure of the architecture and task. We show that our model exhibits this behavior. Lastly, our theory shows how the gap between training and test loss can gradually build up over time due to repeated reuse of data.'
volume: 235
URL: https://proceedings.mlr.press/v235/bordelon24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bordelon24a/bordelon24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bordelon24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Blake
family: Bordelon
- given: Alexander
family: Atanasov
- given: Cengiz
family: Pehlevan
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4345-4382
id: bordelon24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4345
lastpage: 4382
published: 2024-07-08 00:00:00 +0000
- title: 'A New Computationally Efficient Algorithm to solve Feature Selection for Functional Data Classification in High-dimensional Spaces'
abstract: 'This paper introduces a novel methodology for Feature Selection for Functional Classification, FSFC, that addresses the challenge of jointly performing feature selection and classification of functional data in scenarios with categorical responses and multivariate longitudinal features. FSFC tackles a newly defined optimization problem that integrates logistic loss and functional features to identify the most crucial variables for classification. To address the minimization procedure, we employ functional principal components and develop a new adaptive version of the Dual Augmented Lagrangian algorithm. The computational efficiency of FSFC enables handling high-dimensional scenarios where the number of features may considerably exceed the number of statistical units. Simulation experiments demonstrate that FSFC outperforms other machine learning and deep learning methods in computational time and classification accuracy. Furthermore, the FSFC feature selection capability can be leveraged to significantly reduce the problem’s dimensionality and enhance the performances of other classification algorithms. The efficacy of FSFC is also demonstrated through a real data application, analyzing relationships between four chronic diseases and other health and demographic factors. FSFC source code is publicly available at https://github.com/IBM/funGCN.'
volume: 235
URL: https://proceedings.mlr.press/v235/boschi24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/boschi24a/boschi24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-boschi24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tobia
family: Boschi
- given: Francesca
family: Bonin
- given: Rodrigo
family: Ordonez-Hurtado
- given: Alessandra
family: Pascale
- given: Jonathan P
family: Epperlein
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4383-4402
id: boschi24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4383
lastpage: 4402
published: 2024-07-08 00:00:00 +0000
- title: 'Random matrix theory improved Fréchet mean of symmetric positive definite matrices'
abstract: 'In this study, we consider the realm of covariance matrices in machine learning, particularly focusing on computing Fréchet means on the manifold of symmetric positive definite matrices, commonly referred to as Karcher or geometric means. Such means are leveraged in numerous machine learning tasks. Relying on advanced statistical tools, we introduce a random matrix theory based method that estimates Fréchet means, which is particularly beneficial when dealing with low sample support and a high number of matrices to average. Our experimental evaluation, involving both synthetic and real-world EEG and hyperspectral datasets, shows that we largely outperform state-of-the-art methods.'
volume: 235
URL: https://proceedings.mlr.press/v235/bouchard24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bouchard24a/bouchard24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bouchard24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Florent
family: Bouchard
- given: Ammar
family: Mian
- given: Malik
family: Tiomoko
- given: Guillaume
family: Ginolhac
- given: Frederic
family: Pascal
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4403-4415
id: bouchard24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4403
lastpage: 4415
published: 2024-07-08 00:00:00 +0000
- title: 'Improving Neural Additive Models with Bayesian Principles'
abstract: 'Neural additive models (NAMs) enhance the transparency of deep neural networks by handling input features in separate additive sub-networks. However, they lack inherent mechanisms that provide calibrated uncertainties and enable selection of relevant features and interactions. Approaching NAMs from a Bayesian perspective, we augment them in three primary ways, namely by a) providing credible intervals for the individual additive sub-networks; b) estimating the marginal likelihood to perform an implicit selection of features via an empirical Bayes procedure; and c) facilitating the ranking of feature pairs as candidates for second-order interaction in fine-tuned models. In particular, we develop Laplace-approximated NAMs (LA-NAMs), which show improved empirical performance on tabular datasets and challenging real-world medical tasks.'
volume: 235
URL: https://proceedings.mlr.press/v235/bouchiat24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bouchiat24a/bouchiat24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bouchiat24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Kouroche
family: Bouchiat
- given: Alexander
family: Immer
- given: Hugo
family: Yèche
- given: Gunnar
family: Ratsch
- given: Vincent
family: Fortuin
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4416-4443
id: bouchiat24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4416
lastpage: 4443
published: 2024-07-08 00:00:00 +0000
- title: 'S$Ω$I: Score-based O-INFORMATION Estimation'
abstract: 'The analysis of scientific data and complex multivariate systems requires information quantities that capture relationships among multiple random variables. Recently, new information-theoretic measures have been developed to overcome the shortcomings of classical ones, such as mutual information, that are restricted to considering pairwise interactions. Among them, the concept of information synergy and redundancy is crucial for understanding the high-order dependencies between variables. One of the most prominent and versatile measures based on this concept is *O-information*, which provides a clear and scalable way to quantify the synergy-redundancy balance in multivariate systems. However, its practical application is limited to simplified cases. In this work, we introduce **S$\Omega$I**, which allows to compute *O-information* without restrictive assumptions about the system while leveraging a unique model. Our experiments validate our approach on synthetic data, and demonstrate the effectiveness of **S$\Omega$I** in the context of a real-world use case.'
volume: 235
URL: https://proceedings.mlr.press/v235/bounoua24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bounoua24a/bounoua24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bounoua24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mustapha
family: Bounoua
- given: Giulio
family: Franzese
- given: Pietro
family: Michiardi
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4444-4471
id: bounoua24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4444
lastpage: 4471
published: 2024-07-08 00:00:00 +0000
- title: 'On dimensionality of feature vectors in MPNNs'
abstract: 'We revisit the result of Morris et al. (AAAI’19) that message-passing graphs neural networks (MPNNs) are equal in their distinguishing power to the Weisfeiler–Leman (WL) isomorphism test. Morris et al. show their result with ReLU activation function and $O(n)$-dimensional feature vectors, where $n$ is the size of the graph. Recently, by introducing randomness into the architecture, Aamand et al. (NeurIPS’22) improved this bound to $O(\log n)$-dimensional feature vectors, although at the expense of guaranteeing perfect simulation only with high probability. In all these constructions, to guarantee equivalence to the WL test, the dimension of feature vectors in the MPNN has to increase with the size of the graphs. However, architectures used in practice have feature vectors of constant dimension. Thus, there is a gap between the guarantees provided by these results and the actual characteristics of architectures used in practice. In this paper we close this gap by showing that, for *any* non-polynomial analytic (like the sigmoid) activation function, to guarantee that MPNNs are equivalent to the WL test, feature vectors of dimension $d=1$ is all we need, independently of the size of the graphs. Our main technical insight is that for simulating multi-sets in the WL-test, it is enough to use linear independence of feature vectors over rationals instead of reals. Countability of the set of rationals together with nice properties of analytic functions allow us to carry out the simulation invariant over the iterations of the WL test without increasing the dimension of the feature vectors.'
volume: 235
URL: https://proceedings.mlr.press/v235/bravo24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bravo24a/bravo24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bravo24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: César
family: Bravo
- given: Alexander
family: Kozachinskiy
- given: Cristobal
family: Rojas
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4472-4481
id: bravo24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4472
lastpage: 4481
published: 2024-07-08 00:00:00 +0000
- title: 'Integrating Multimodal Data for Joint Generative Modeling of Complex Dynamics'
abstract: 'Many, if not most, systems of interest in science are naturally described as nonlinear dynamical systems. Empirically, we commonly access these systems through time series measurements. Often such time series may consist of discrete random variables rather than continuous measurements, or may be composed of measurements from multiple data modalities observed simultaneously. For instance, in neuroscience we may have behavioral labels in addition to spike counts and continuous physiological recordings. While by now there is a burgeoning literature on deep learning for dynamical systems reconstruction (DSR), multimodal data integration has hardly been considered in this context. Here we provide such an efficient and flexible algorithmic framework that rests on a multimodal variational autoencoder for generating a sparse teacher signal that guides training of a reconstruction model, exploiting recent advances in DSR training techniques. It enables to combine various sources of information for optimal reconstruction, even allows for reconstruction from symbolic data (class labels) alone, and connects different types of observations within a common latent dynamics space. In contrast to previous multimodal data integration techniques for scientific applications, our framework is fully generative, producing, after training, trajectories with the same geometrical and temporal structure as those of the ground truth system.'
volume: 235
URL: https://proceedings.mlr.press/v235/brenner24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/brenner24a/brenner24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-brenner24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Manuel
family: Brenner
- given: Florian
family: Hess
- given: Georgia
family: Koppe
- given: Daniel
family: Durstewitz
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4482-4516
id: brenner24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4482
lastpage: 4516
published: 2024-07-08 00:00:00 +0000
- title: 'Fully-Dynamic Approximate Decision Trees With Worst-Case Update Time Guarantees'
abstract: 'We study the problem of maintaining a decision tree in the fully-dynamic setting, where the dataset is updated by an adversarial sequence of insertions and deletions. We present the first algorithm with strong guarantees on both the quality of the tree and the worst-case update time (the maximum time spent between two consecutive dataset updates). For instance, we can maintain a tree where each node has Gini gain within $\beta$ of the optimum, while guaranteeing an update time $O(d \beta^{-3} \log^4 n )$, where $d$ is the number of features and $n$ the maximum size of the dataset. This is optimal up to polylogarithmic factors, as any dynamic algorithm must have update time in $\Omega(d)$. Similar guarantees hold for the variance and information gain, for classification and regression, and even for *boosted* trees. This shows that many popular decision trees such as ID3 or C4.5 can be efficiently be made dynamic, answering an open question of Bressan, Damay and Sozio (AAAI 2023). We also show that, under the 3SUM conjecture or the Orthogonal Vectors Hypothesis, the update time must be polynomial in $1/\beta$.'
volume: 235
URL: https://proceedings.mlr.press/v235/bressan24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bressan24a/bressan24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bressan24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Marco
family: Bressan
- given: Mauro
family: Sozio
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4517-4541
id: bressan24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4517
lastpage: 4541
published: 2024-07-08 00:00:00 +0000
- title: 'Applying language models to algebraic topology: generating simplicial cycles using multi-labeling in Wu’s formula'
abstract: 'Computing homotopy groups of spheres has long been a fundamental objective in algebraic topology. Various theoretical and algorithmic approaches have been developed to tackle this problem. In this paper we take a step towards the goal of comprehending the group-theoretic structure of the generators of these homotopy groups by leveraging the power of machine learning. Specifically, in the simplicial group setting of Wu’s formula, we reformulate the problem of generating simplicial cycles as a problem of sampling from the intersection of algorithmic datasets related to Dyck languages. We present and evaluate language modelling approaches that employ multi-label information for input sequences, along with the necessary group-theoretic toolkit and non-neural baselines.'
volume: 235
URL: https://proceedings.mlr.press/v235/brilliantov24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/brilliantov24a/brilliantov24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-brilliantov24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Kirill
family: Brilliantov
- given: Fedor
family: Pavutnitskiy
- given: Dmitry
family: Pasechnyuk
- given: German
family: Magai
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4542-4560
id: brilliantov24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4542
lastpage: 4560
published: 2024-07-08 00:00:00 +0000
- title: 'Private Gradient Descent for Linear Regression: Tighter Error Bounds and Instance-Specific Uncertainty Estimation'
abstract: 'We provide an improved analysis of standard differentially private gradient descent for linear regression under the squared error loss. Under modest assumptions on the input, we characterize the distribution of the iterate at each time step. Our analysis leads to new results on the algorithm’s accuracy: for a proper fixed choice of hyperparameters, the sample complexity depends only linearly on the dimension of the data. This matches the dimension-dependence of the (non-private) ordinary least squares estimator as well as that of recent private algorithms that rely on sophisticated adaptive gradient-clipping schemes (Varshney et al., 2022; Liu et al., 2023). Our analysis of the iterates’ distribution also allows us to construct confidence intervals for the empirical optimizer which adapt automatically to the variance of the algorithm on a particular data set. We validate our theorems through experiments on synthetic data.'
volume: 235
URL: https://proceedings.mlr.press/v235/brown24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/brown24a/brown24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-brown24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Gavin R
family: Brown
- given: Krishnamurthy Dj
family: Dvijotham
- given: Georgina
family: Evans
- given: Daogao
family: Liu
- given: Adam
family: Smith
- given: Abhradeep
family: Guha Thakurta
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4561-4584
id: brown24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4561
lastpage: 4584
published: 2024-07-08 00:00:00 +0000
- title: 'Scalable AI Safety via Doubly-Efficient Debate'
abstract: 'The emergence of pre-trained AI systems with powerful capabilities across a diverse and ever-increasing set of complex domains has raised a critical challenge for AI safety as tasks can become too complicated for humans to judge directly. Irving et al (2018). proposed a debate method in this direction with the goal of pitting the power of such AI models against each other until the problem of identifying (mis)-alignment is broken down into a manageable subtask. While the promise of this approach is clear, the original framework was based on the assumption that the honest strategy is able to simulate *deterministic* AI systems for an *exponential* number of steps, limiting its applicability. In this paper, we show how to address these challenges by designing a new set of debate protocols where the honest strategy can always succeed using a simulation of a *polynomial* number of steps, whilst being able to verify the alignment of *stochastic* AI systems, even when the dishonest strategy is allowed to use exponentially many simulation steps.'
volume: 235
URL: https://proceedings.mlr.press/v235/brown-cohen24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/brown-cohen24a/brown-cohen24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-brown-cohen24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jonah
family: Brown-Cohen
- given: Geoffrey
family: Irving
- given: Georgios
family: Piliouras
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4585-4602
id: brown-cohen24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4585
lastpage: 4602
published: 2024-07-08 00:00:00 +0000
- title: 'Genie: Generative Interactive Environments'
abstract: 'We introduce Genie, the first *generative interactive environment* trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a *foundation world model*. It is comprised of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model. Genie enables users to act in the generated environments on a frame-by-frame basis *despite training without any ground-truth action labels* or other domain specific requirements typically found in the world model literature. Further the resulting learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening the path for training generalist agents of the future.'
volume: 235
URL: https://proceedings.mlr.press/v235/bruce24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bruce24a/bruce24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bruce24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jake
family: Bruce
- given: Michael D
family: Dennis
- given: Ashley
family: Edwards
- given: Jack
family: Parker-Holder
- given: Yuge
family: Shi
- given: Edward
family: Hughes
- given: Matthew
family: Lai
- given: Aditi
family: Mavalankar
- given: Richie
family: Steigerwald
- given: Chris
family: Apps
- given: Yusuf
family: Aytar
- given: Sarah Maria Elisabeth
family: Bechtle
- given: Feryal
family: Behbahani
- given: Stephanie C.Y.
family: Chan
- given: Nicolas
family: Heess
- given: Lucy
family: Gonzalez
- given: Simon
family: Osindero
- given: Sherjil
family: Ozair
- given: Scott
family: Reed
- given: Jingwei
family: Zhang
- given: Konrad
family: Zolna
- given: Jeff
family: Clune
- given: Nando De
family: Freitas
- given: Satinder
family: Singh
- given: Tim
family: Rocktäschel
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4603-4623
id: bruce24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4603
lastpage: 4623
published: 2024-07-08 00:00:00 +0000
- title: 'HAMLET: Graph Transformer Neural Operator for Partial Differential Equations'
abstract: 'We present a novel graph transformer framework, HAMLET, designed to address the challenges in solving partial differential equations (PDEs) using neural networks. The framework uses graph transformers with modular input encoders to directly incorporate differential equation information into the solution process. This modularity enhances parameter correspondence control, making HAMLET adaptable to PDEs of arbitrary geometries and varied input formats. Notably, HAMLET scales effectively with increasing data complexity and noise, showcasing its robustness. HAMLET is not just tailored to a single type of physical simulation, but can be applied across various domains. Moreover, it boosts model resilience and performance, especially in scenarios with limited data. We demonstrate, through extensive experiments, that our framework is capable of outperforming current techniques for PDEs.'
volume: 235
URL: https://proceedings.mlr.press/v235/bryutkin24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bryutkin24a/bryutkin24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bryutkin24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Andrey
family: Bryutkin
- given: Jiahao
family: Huang
- given: Zhongying
family: Deng
- given: Guang
family: Yang
- given: Carola-Bibiane
family: Schönlieb
- given: Angelica I
family: Aviles-Rivero
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4624-4641
id: bryutkin24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4624
lastpage: 4641
published: 2024-07-08 00:00:00 +0000
- title: 'Provably Neural Active Learning Succeeds via Prioritizing Perplexing Samples'
abstract: 'Neural Network-based active learning (NAL) is a cost-effective data selection technique that utilizes neural networks to select and train on a small subset of samples. While existing work successfully develops various effective or theory-justified NAL algorithms, the understanding of the two commonly used query criteria of NAL: uncertainty-based and diversity-based, remains in its infancy. In this work, we try to move one step forward by offering a unified explanation for the success of both query criteria-based NAL from a feature learning view. Specifically, we consider a feature-noise data model comprising easy-to-learn or hard-to-learn features disrupted by noise, and conduct analysis over 2-layer NN-based NALs in the pool-based scenario. We provably show that both uncertainty-based and diversity-based NAL are inherently amenable to one and the same principle, i.e., striving to prioritize samples that contain yet-to-be-learned features. We further prove that this shared principle is the key to their success-achieve small test error within a small labeled set. Contrastingly, the strategy-free passive learning exhibits a large test error due to the inadequate learning of yet-to-be-learned features, necessitating resort to a significantly larger label complexity for a sufficient test error reduction. Experimental results validate our findings.'
volume: 235
URL: https://proceedings.mlr.press/v235/bu24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bu24a/bu24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bu24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Dake
family: Bu
- given: Wei
family: Huang
- given: Taiji
family: Suzuki
- given: Ji
family: Cheng
- given: Qingfu
family: Zhang
- given: Zhiqiang
family: Xu
- given: Hau-San
family: Wong
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4642-4695
id: bu24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4642
lastpage: 4695
published: 2024-07-08 00:00:00 +0000
- title: 'Tackling Prevalent Conditions in Unsupervised Combinatorial Optimization: Cardinality, Minimum, Covering, and More'
abstract: 'Combinatorial optimization (CO) is naturally discrete, making machine-learning techniques based on differentiable optimization inapplicable. Karalias & Loukas (2020) adapted the probabilistic method by Erdős & Spencer (1974), to incorporate CO into differentiable optimization. Their work ignited the research on unsupervised learning for CO, composed of two main components: probabilistic objectives and derandomization. However, each component confronts unique challenges. First, deriving objectives under complex conditions and constraints is nontrivial. Second, the derandomization process is underexplored, and the existing derandomization methods are either random sampling or naive rounding. In this work, we aim to tackle complex conditions in unsupervised CO. First, we concretize the targets for probabilistic objective construction and derandomization with theoretical justification. Then, for various complex conditions commonly involved in different CO problems, we derive nontrivial objectives and derandomization to meet the targets. Finally, we apply the derivations to various CO problems. Via extensive experiments on synthetic and real-world graphs, we validate the correctness of our derivations and show our empirical superiority w.r.t. both optimization quality and speed.'
volume: 235
URL: https://proceedings.mlr.press/v235/bu24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bu24b/bu24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bu24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Fanchen
family: Bu
- given: Hyeonsoo
family: Jo
- given: Soo Yong
family: Lee
- given: Sungsoo
family: Ahn
- given: Kijung
family: Shin
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4696-4729
id: bu24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4696
lastpage: 4729
published: 2024-07-08 00:00:00 +0000
- title: 'Differentially Private Bias-Term Fine-tuning of Foundation Models'
abstract: 'We study the problem of differentially private (DP) fine-tuning of large pre-trained models — a recent privacy-preserving approach suitable for solving downstream tasks with sensitive data. Existing work has demonstrated that high accuracy is possible under strong privacy constraint, yet requires significant computational overhead or modifications to the network architecture. We propose differentially private bias-term fine-tuning (DP-BiTFiT), which matches the state-of-the-art accuracy for DP algorithms and the efficiency of the standard BiTFiT. DP-BiTFiT is model agnostic (not modifying the network architecture), parameter efficient (only training about 0.1% of the parameters), and computation efficient (almost removing the overhead caused by DP, in both the time and space complexity). On a wide range of tasks, DP-BiTFiT is 2 - 30X faster and uses 2 - 8X less memory than DP full fine-tuning, even faster than the standard full fine-tuning. This amazing efficiency enables us to conduct DP fine-tuning on language and vision tasks with long-sequence texts and high-resolution images, which were computationally difficult using existing methods.'
volume: 235
URL: https://proceedings.mlr.press/v235/bu24c.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bu24c/bu24c.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bu24c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zhiqi
family: Bu
- given: Yu-Xiang
family: Wang
- given: Sheng
family: Zha
- given: George
family: Karypis
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4730-4751
id: bu24c
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4730
lastpage: 4751
published: 2024-07-08 00:00:00 +0000
- title: 'Bayesian Optimization of Function Networks with Partial Evaluations'
abstract: 'Bayesian optimization is a powerful framework for optimizing functions that are expensive or time-consuming to evaluate. Recent work has considered Bayesian optimization of function networks (BOFN), where the objective function is given by a network of functions, each taking as input the output of previous nodes in the network as well as additional parameters. Leveraging this network structure has been shown to yield significant performance improvements. Existing BOFN algorithms for general-purpose networks evaluate the full network at each iteration. However, many real-world applications allow for evaluating nodes individually. To exploit this, we propose a novel knowledge gradient acquisition function that chooses which node and corresponding inputs to evaluate in a cost-aware manner, thereby reducing query costs by evaluating only on a part of the network at each step. We provide an efficient approach to optimizing our acquisition function and show that it outperforms existing BOFN methods and other benchmarks across several synthetic and real-world problems. Our acquisition function is the first to enable cost-aware optimization of a broad class of function networks.'
volume: 235
URL: https://proceedings.mlr.press/v235/buathong24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/buathong24a/buathong24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-buathong24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Poompol
family: Buathong
- given: Jiayue
family: Wan
- given: Raul
family: Astudillo
- given: Sam
family: Daulton
- given: Maximilian
family: Balandat
- given: Peter I.
family: Frazier
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4752-4784
id: buathong24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4752
lastpage: 4784
published: 2024-07-08 00:00:00 +0000
- title: 'Robustness of Nonlinear Representation Learning'
abstract: 'We study the problem of unsupervised representation learning in slightly misspecified settings, and thus formalize the study of robustness of nonlinear representation learning. We focus on the case where the mixing is close to a local isometry in a suitable distance and show based on existing rigidity results that the mixing can be identified up to linear transformations and small errors. In a second step, we investigate Independent Component Analysis (ICA) with observations generated according to $x=f(s)=As+h(s)$ where $A$ is an invertible mixing matrix and $h$ a small perturbation. We show that we can approximately recover the matrix $A$ and the independent components. Together, these two results show approximate identifiability of nonlinear ICA with almost isometric mixing functions. Those results are a step towards identifiability results for unsupervised representation learning for real-world data that do not follow restrictive model classes.'
volume: 235
URL: https://proceedings.mlr.press/v235/buchholz24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/buchholz24a/buchholz24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-buchholz24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Simon
family: Buchholz
- given: Bernhard
family: Schölkopf
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4785-4821
id: buchholz24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4785
lastpage: 4821
published: 2024-07-08 00:00:00 +0000
- title: 'Density-Softmax: Efficient Test-time Model for Uncertainty Estimation and Robustness under Distribution Shifts'
abstract: 'Sampling-based methods, e.g., Deep Ensembles and Bayesian Neural Nets have become promising approaches to improve the quality of uncertainty estimation and robust generalization. However, they suffer from a large model size and high latency at test time, which limits the scalability needed for low-resource devices and real-time applications. To resolve these computational issues, we propose Density-Softmax, a sampling-free deterministic framework via combining a density function built on a Lipschitz-constrained feature extractor with the softmax layer. Theoretically, we show that our model is the solution of minimax uncertainty risk and is distance-aware on feature space, thus reducing the over-confidence of the standard softmax under distribution shifts. Empirically, our method enjoys competitive results with state-of-the-art techniques in terms of uncertainty and robustness, while having a lower number of model parameters and a lower latency at test time.'
volume: 235
URL: https://proceedings.mlr.press/v235/bui24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bui24a/bui24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bui24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ha Manh
family: Bui
- given: Anqi
family: Liu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4822-4853
id: bui24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4822
lastpage: 4853
published: 2024-07-08 00:00:00 +0000
- title: 'Explaining Graph Neural Networks via Structure-aware Interaction Index'
abstract: 'The Shapley value is a prominent tool for interpreting black-box machine learning models thanks to its strong theoretical foundation. However, for models with structured inputs, such as graph neural networks, existing Shapley-based explainability approaches either focus solely on node-wise importance or neglect the graph structure when perturbing the input instance. This paper introduces the Myerson-Taylor interaction index that internalizes the graph structure into attributing the node values and the interaction values among nodes. Unlike the Shapley-based methods, the Myerson-Taylor index decomposes coalitions into components satisfying a pre-chosen connectivity criterion. We prove that the Myerson-Taylor index is the unique one that satisfies a system of five natural axioms accounting for graph structure and high-order interaction among nodes. Leveraging these properties, we propose Myerson-Taylor Structure-Aware Graph Explainer (MAGE), a novel explainer that uses the second-order Myerson-Taylor index to identify the most important motifs influencing the model prediction, both positively and negatively. Extensive experiments on various graph datasets and models demonstrate that our method consistently provides superior subgraph explanations compared to state-of-the-art methods.'
volume: 235
URL: https://proceedings.mlr.press/v235/bui24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bui24b/bui24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bui24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ngoc
family: Bui
- given: Hieu Trung
family: Nguyen
- given: Viet Anh
family: Nguyen
- given: Rex
family: Ying
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4854-4883
id: bui24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4854
lastpage: 4883
published: 2024-07-08 00:00:00 +0000
- title: 'Assessing Large Language Models on Climate Information'
abstract: 'As Large Language Models (LLMs) rise in popularity, it is necessary to assess their capability in critically relevant domains. We present a comprehensive evaluation framework, grounded in science communication research, to assess LLM responses to questions about climate change. Our framework emphasizes both presentational and epistemological adequacy, offering a fine-grained analysis of LLM generations spanning 8 dimensions and 30 issues. Our evaluation task is a real-world example of a growing number of challenging problems where AI can complement and lift human performance. We introduce a novel protocol for scalable oversight that relies on AI Assistance and raters with relevant education. We evaluate several recent LLMs on a set of diverse climate questions. Our results point to a significant gap between surface and epistemological qualities of LLMs in the realm of climate communication.'
volume: 235
URL: https://proceedings.mlr.press/v235/bulian24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/bulian24a/bulian24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-bulian24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jannis
family: Bulian
- given: Mike S.
family: Schäfer
- given: Afra
family: Amini
- given: Heidi
family: Lam
- given: Massimiliano
family: Ciaramita
- given: Ben
family: Gaiarin
- given: Michelle
family: Chen Huebscher
- given: Christian
family: Buck
- given: Niels G.
family: Mede
- given: Markus
family: Leippold
- given: Nadine
family: Strauss
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4884-4935
id: bulian24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4884
lastpage: 4935
published: 2024-07-08 00:00:00 +0000
- title: 'Semantically-correlated memories in a dense associative model'
abstract: 'I introduce a novel associative memory model named *Correlated Dense Associative Memory* (CDAM), which integrates both auto- and hetero-association in a unified framework for continuous-valued memory patterns. Employing an arbitrary graph structure to semantically link memory patterns, CDAM is theoretically and numerically analysed, revealing four distinct dynamical modes: auto-association, narrow hetero-association, wide hetero-association, and neutral quiescence. Drawing inspiration from inhibitory modulation studies, I employ anti-Hebbian learning rules to control the range of hetero-association, extract multi-scale representations of community structures in graphs, and stabilise the recall of temporal sequences. Experimental demonstrations showcase CDAM’s efficacy in handling real-world data, replicating a classical neuroscience experiment, performing image retrieval, and simulating arbitrary finite automata.'
volume: 235
URL: https://proceedings.mlr.press/v235/burns24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/burns24a/burns24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-burns24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Thomas F
family: Burns
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4936-4970
id: burns24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4936
lastpage: 4970
published: 2024-07-08 00:00:00 +0000
- title: 'Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision'
abstract: 'Widely used alignment techniques, such as reinforcement learning from human feedback (RLHF), rely on the ability of humans to supervise model behavior—for example, to evaluate whether a model faithfully followed instructions or generated safe outputs. However, future superhuman models will behave in complex ways too difficult for humans to reliably evaluate; humans will only be able to *weakly supervise* superhuman models. We study an analogy to this problem: can weak model supervision elicit the full capabilities of a much stronger model? We test this using a range of pretrained language models in the GPT-4 family on natural language processing (NLP), chess, and reward modeling tasks. We find that when we naively finetune strong pretrained models on labels generated by a weak model, they consistently perform better than their weak supervisors, a phenomenon we call *weak-to-strong generalization*. However, we are still far from recovering the full capabilities of strong models with naive finetuning alone, suggesting that techniques like RLHF may scale poorly to superhuman models without further work. We find that simple methods can often significantly improve weak-to-strong generalization: for example, when finetuning GPT-4 with a GPT-2-level supervisor and an auxiliary confidence loss, we can recover close to GPT-3.5-level performance on NLP tasks. Our results suggest that it is feasible to make empirical progress today on a fundamental challenge of aligning superhuman models.'
volume: 235
URL: https://proceedings.mlr.press/v235/burns24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/burns24b/burns24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-burns24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Collin
family: Burns
- given: Pavel
family: Izmailov
- given: Jan Hendrik
family: Kirchner
- given: Bowen
family: Baker
- given: Leo
family: Gao
- given: Leopold
family: Aschenbrenner
- given: Yining
family: Chen
- given: Adrien
family: Ecoffet
- given: Manas
family: Joglekar
- given: Jan
family: Leike
- given: Ilya
family: Sutskever
- given: Jeffrey
family: Wu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 4971-5012
id: burns24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 4971
lastpage: 5012
published: 2024-07-08 00:00:00 +0000
- title: 'CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay'
abstract: 'Large language models are increasingly solving tasks that are commonly believed to require human-level reasoning ability. However, these models still perform very poorly on benchmarks of general intelligence such as the Abstraction and Reasoning Corpus (ARC). In this paper, we approach the ARC as a programming-by-examples problem, and introduce a novel and scalable method for language model self-improvement called Code Iteration (CodeIt). Our method iterates between 1) program sampling and hindsight relabeling, and 2) learning from prioritized experience replay. By relabeling the goal of an episode (i.e., the program output given input) to the output actually produced by the sampled program, our method effectively deals with the extreme sparsity of rewards in program synthesis. Applying CodeIt to the ARC dataset, we demonstrate that prioritized hindsight replay, along with pre-training and data-augmentation, leads to successful inter-task generalization. CodeIt is the first neuro-symbolic approach that scales to the full ARC evaluation dataset. Our method solves 15% of ARC evaluation tasks, achieving state-of-the-art performance and outperforming existing neural and symbolic baselines. Our code is available at https://github.com/Qualcomm-AI-research/codeit.'
volume: 235
URL: https://proceedings.mlr.press/v235/butt24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/butt24a/butt24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-butt24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Natasha
family: Butt
- given: Blazej
family: Manczak
- given: Auke
family: Wiggers
- given: Corrado
family: Rainone
- given: David W.
family: Zhang
- given: Michaël
family: Defferrard
- given: Taco
family: Cohen
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5013-5034
id: butt24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5013
lastpage: 5034
published: 2024-07-08 00:00:00 +0000
- title: 'How Uniform Random Weights Induce Non-uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers'
abstract: 'A main theoretical puzzle is why over-parameterized Neural Networks (NNs) generalize well when trained to zero loss (i.e., so they interpolate the data). Usually, the NN is trained with Stochastic Gradient Descent (SGD) or one of its variants. However, recent empirical work examined the generalization of a random NN that interpolates the data: the NN was sampled from a seemingly uniform prior over the parameters, conditioned on that the NN perfectly classifying the training set. Interestingly, such a NN sample typically generalized as well as SGD-trained NNs. We prove that such a random NN interpolator typically generalizes well if there exists an underlying narrow “teacher NN" that agrees with the labels. Specifically, we show that such a ‘flat’ prior over the NN parametrization induces a rich prior over the NN functions, due to the redundancy in the NN structure. In particular, this creates a bias towards simpler functions, which require less relevant parameters to represent — enabling learning with a sample complexity approximately proportional to the complexity of the teacher (roughly, the number of non-redundant parameters), rather than the student’s.'
volume: 235
URL: https://proceedings.mlr.press/v235/buzaglo24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/buzaglo24a/buzaglo24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-buzaglo24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Gon
family: Buzaglo
- given: Itamar
family: Harel
- given: Mor Shpigel
family: Nacson
- given: Alon
family: Brutzkus
- given: Nathan
family: Srebro
- given: Daniel
family: Soudry
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5035-5081
id: buzaglo24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5035
lastpage: 5081
published: 2024-07-08 00:00:00 +0000
- title: 'Estimating Distributional Treatment Effects in Randomized Experiments: Machine Learning for Variance Reduction'
abstract: 'We propose a novel regression adjustment method designed for estimating distributional treatment effect parameters in randomized experiments. Randomized experiments have been extensively used to estimate treatment effects in various scientific fields. However, to gain deeper insights, it is essential to estimate distributional treatment effects rather than relying solely on average effects. Our approach incorporates pre-treatment covariates into a distributional regression framework, utilizing machine learning techniques to improve the precision of distributional treatment effect estimators. The proposed approach can be readily implemented with off-the-shelf machine learning methods and remains valid as long as the nuisance components are reasonably well estimated. Also, we establish the asymptotic properties of the proposed estimator and present a uniformly valid inference method. Through simulation results and real data analysis, we demonstrate the effectiveness of integrating machine learning techniques in reducing the variance of distributional treatment effect estimators in finite samples.'
volume: 235
URL: https://proceedings.mlr.press/v235/byambadalai24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/byambadalai24a/byambadalai24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-byambadalai24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Undral
family: Byambadalai
- given: Tatsushi
family: Oka
- given: Shota
family: Yasui
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5082-5113
id: byambadalai24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5082
lastpage: 5113
published: 2024-07-08 00:00:00 +0000
- title: 'Learning Associative Memories with Gradient Descent'
abstract: 'This work focuses on the training dynamics of one associative memory module storing outer products of token embeddings. We reduce this problem to the study of a system of particles, which interact according to properties of the data distribution and correlations between embeddings. Through theory and experiments, we provide several insights. In overparameterized regimes, we obtain logarithmic growth of the “classification margins.” Yet, we show that imbalance in token frequencies and memory interferences due to correlated embeddings lead to oscillatory transitory regimes. The oscillations are more pronounced with large step sizes, which can create benign loss spikes, although these learning rates speed up the dynamics and accelerate the asymptotic convergence. We also find that underparameterized regimes lead to suboptimal memorization schemes. Finally, we assess the validity of our findings on small Transformer models.'
volume: 235
URL: https://proceedings.mlr.press/v235/cabannes24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cabannes24a/cabannes24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cabannes24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Vivien
family: Cabannes
- given: Berfin
family: Simsek
- given: Alberto
family: Bietti
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5114-5134
id: cabannes24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5114
lastpage: 5134
published: 2024-07-08 00:00:00 +0000
- title: 'Bridging Environments and Language with Rendering Functions and Vision-Language Models'
abstract: 'Vision-language models (VLMs) have tremendous potential for *grounding* language, and thus enabling *language-conditioned agents (LCAs)* to perform diverse tasks specified with text. This has motivated the study of LCAs based on reinforcement learning (RL) with rewards given by rendering images of an environment and evaluating those images with VLMs. If single-task RL is employed, such approaches are limited by the cost and time required to train a policy for each new task. Multi-task RL (MTRL) is a natural alternative, but requires a carefully designed corpus of training tasks and does not always generalize reliably to new tasks. Therefore, this paper introduces a novel decomposition of the problem of building an LCA: first find an *environment configuration* that has a high VLM score for text describing a task; then use a (pretrained) goal-conditioned policy to reach that configuration. We also explore several enhancements to the speed and quality of VLM-based LCAs, notably, the use of distilled models, and the evaluation of configurations from multiple viewpoints to resolve the ambiguities inherent in a single 2D view. We demonstrate our approach on the Humanoid environment, showing that it results in LCAs that outperform MTRL baselines in zero-shot generalization, without requiring any textual task descriptions or other forms of environment-specific annotation during training.'
volume: 235
URL: https://proceedings.mlr.press/v235/cachet24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cachet24a/cachet24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cachet24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Theo
family: Cachet
- given: Christopher R
family: Dance
- given: Olivier
family: Sigaud
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5135-5188
id: cachet24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5135
lastpage: 5188
published: 2024-07-08 00:00:00 +0000
- title: 'Vocabulary for Universal Approximation: A Linguistic Perspective of Mapping Compositions'
abstract: 'In recent years, deep learning-based sequence modelings, such as language models, have received much attention and success, which pushes researchers to explore the possibility of transforming non-sequential problems into a sequential form. Following this thought, deep neural networks can be represented as composite functions of a sequence of mappings, linear or nonlinear, where each composition can be viewed as a word. However, the weights of linear mappings are undetermined and hence require an infinite number of words. In this article, we investigate the finite case and constructively prove the existence of a finite vocabulary $V$=$\phi_i: \mathbb{R}^d \to \mathbb{R}^d | i=1,...,n$ with $n=O(d^2)$ for the universal approximation. That is, for any continuous mapping $f: \mathbb{R}^d \to \mathbb{R}^d$, compact domain $\Omega$ and $\varepsilon>0$, there is a sequence of mappings $\phi_{i_1}, ..., \phi_{i_m} \in V, m \in \mathbb{Z}^+$, such that the composition $\phi_{i_m} \circ ... \circ \phi_{i_1} $ approximates $f$ on $\Omega$ with an error less than $\varepsilon$. Our results demonstrate an unusual approximation power of mapping compositions and motivate a novel compositional model for regular languages.'
volume: 235
URL: https://proceedings.mlr.press/v235/cai24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cai24a/cai24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cai24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yongqiang
family: Cai
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5189-5208
id: cai24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5189
lastpage: 5208
published: 2024-07-08 00:00:00 +0000
- title: 'Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads'
abstract: 'Large Language Models (LLMs) employ auto-regressive decoding that requires sequential computation, with each step reliant on the previous one’s output. This creates a bottleneck as each step necessitates moving the full model parameters from High-Bandwidth Memory (HBM) to the accelerator’s cache. While methods such as speculative decoding have been suggested to address this issue, their implementation is impeded by the challenges associated with acquiring and maintaining a separate draft model. In this paper, we present Medusa, an efficient method that augments LLM inference by adding extra decoding heads to predict multiple subsequent tokens in parallel. Using a tree-based attention mechanism, Medusa constructs multiple candidate continuations and verifies them simultaneously in each decoding step. By leveraging parallel processing, Medusa reduces the number of decoding steps required. We present two levels of fine-tuning procedures for Medusa to meet the needs of different use cases: Medusa-1: Medusa is directly fine-tuned on top of a frozen backbone LLM, enabling lossless inference acceleration. Medusa-2: Medusa is fine-tuned together with the backbone LLM, enabling better prediction accuracy of Medusa heads and higher speedup but needing a special training recipe that preserves the model’s capabilities. Moreover, we propose several extensions that improve or expand the utility of Medusa, including a self-distillation to handle situations where no training data is available and a typical acceptance scheme to boost the acceptance rate while maintaining generation quality. We evaluate Medusa on models of various sizes and training procedures. Our experiments demonstrate that Medusa-1 can achieve over 2.2$\times$ speedup without compromising generation quality, while Medusa-2 further improves the speedup to 2.3-2.8$\times$.'
volume: 235
URL: https://proceedings.mlr.press/v235/cai24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cai24b/cai24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cai24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tianle
family: Cai
- given: Yuhong
family: Li
- given: Zhengyang
family: Geng
- given: Hongwu
family: Peng
- given: Jason D.
family: Lee
- given: Deming
family: Chen
- given: Tri
family: Dao
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5209-5235
id: cai24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5209
lastpage: 5235
published: 2024-07-08 00:00:00 +0000
- title: 'Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation'
abstract: 'Large-scale pretrained models have proven immensely valuable in handling data-intensive modalities like text and image. However, fine-tuning these models for certain specialized modalities, such as protein sequence and cosmic ray, poses challenges due to the significant modality discrepancy and scarcity of labeled data. In this paper, we propose an end-to-end method, **PaRe**, to enhance cross-modal fine-tuning, aiming to transfer a large-scale pretrained model to various target modalities. **PaRe** employs a gating mechanism to select key patches from both source and target data. Through a modality-agnostic **Pa**tch **Re**placement scheme, these patches are preserved and combined to construct data-rich intermediate modalities ranging from easy to hard. By gradually intermediate modality generation, we can not only effectively bridge the modality gap to enhance stability and transferability of cross-modal fine-tuning, but also address the challenge of limited data in the target modality by leveraging enriched intermediate modality data. Compared with hand-designed, general-purpose, task-specific, and state-of-the-art cross-modal fine-tuning approaches, **PaRe** demonstrates superior performance across three challenging benchmarks, encompassing more than ten modalities.'
volume: 235
URL: https://proceedings.mlr.press/v235/cai24c.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cai24c/cai24c.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cai24c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lincan
family: Cai
- given: Shuang
family: Li
- given: Wenxuan
family: Ma
- given: Jingxuan
family: Kang
- given: Binhui
family: Xie
- given: Zixun
family: Sun
- given: Chengwei
family: Zhu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5236-5257
id: cai24c
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5236
lastpage: 5257
published: 2024-07-08 00:00:00 +0000
- title: 'Batch and match: black-box variational inference with a score-based divergence'
abstract: 'Most leading implementations of black-box variational inference (BBVI) are based on optimizing a stochastic evidence lower bound (ELBO). But such approaches to BBVI often converge slowly due to the high variance of their gradient estimates and their sensitivity to hyperparameters. In this work, we propose *batch and match* (BaM), an alternative approach to BBVI based on a score-based divergence. Notably, this score-based divergence can be optimized by a closed-form proximal update for Gaussian variational families with full covariance matrices. We analyze the convergence of BaM when the target distribution is Gaussian, and we prove that in the limit of infinite batch size the variational parameter updates converge exponentially quickly to the target mean and covariance. We also evaluate the performance of BaM on Gaussian and non-Gaussian target distributions that arise from posterior inference in hierarchical and deep generative models. In these experiments, we find that BaM typically converges in fewer (and sometimes significantly fewer) gradient evaluations than leading implementations of BBVI based on ELBO maximization.'
volume: 235
URL: https://proceedings.mlr.press/v235/cai24d.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cai24d/cai24d.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cai24d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Diana
family: Cai
- given: Chirag
family: Modi
- given: Loucas
family: Pillaud-Vivien
- given: Charles
family: Margossian
- given: Robert M.
family: Gower
- given: David
family: Blei
- given: Lawrence K.
family: Saul
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5258-5297
id: cai24d
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5258
lastpage: 5297
published: 2024-07-08 00:00:00 +0000
- title: 'Flextron: Many-in-One Flexible Large Language Model'
abstract: 'Training modern LLMs is extremely resource intensive, and customizing them for various deployment scenarios characterized by limited compute and memory resources through repeated training is impractical. In this paper, we introduce Flextron, a network architecture and post-training model optimization framework supporting flexible model deployment. The Flextron architecture utilizes a nested elastic structure to rapidly adapt to specific user-defined latency and accuracy targets during inference with no additional fine-tuning required. It is also input-adaptive, and can automatically route tokens through its sub-networks for improved performance and efficiency. We present a sample-efficient training method and associated routing algorithms for systematically transforming an existing trained LLM into a Flextron model. We evaluate Flextron on the GPT-3 and LLama-2 family of LLMs, and demonstrate superior performance over multiple end-to-end trained variants and other state-of-the-art elastic networks, all with a single pretraining run that consumes a mere 7.63% tokens compared to original pretraining.'
volume: 235
URL: https://proceedings.mlr.press/v235/cai24e.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cai24e/cai24e.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cai24e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ruisi
family: Cai
- given: Saurav
family: Muralidharan
- given: Greg
family: Heinrich
- given: Hongxu
family: Yin
- given: Zhangyang
family: Wang
- given: Jan
family: Kautz
- given: Pavlo
family: Molchanov
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5298-5311
id: cai24e
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5298
lastpage: 5311
published: 2024-07-08 00:00:00 +0000
- title: 'Accelerated Algorithms for Constrained Nonconvex-Nonconcave Min-Max Optimization and Comonotone Inclusion'
abstract: 'We study constrained comonotone min-max optimization, a structured class of nonconvex-nonconcave min-max optimization problems, and their generalization to comonotone inclusion. In our first contribution, we extend the *Extra Anchored Gradient (EAG)* algorithm, originally proposed by Yoon and Ryu (2021) for unconstrained min-max optimization, to constrained comonotone min-max optimization and comonotone inclusion, achieving an optimal convergence rate of $O\left(\frac{1}{T}\right)$ among all first-order methods. Additionally, we prove that the algorithm’s iterations converge to a point in the solution set. In our second contribution, we extend the *Fast Extra Gradient (FEG)* algorithm, as developed by Lee and Kim (2021), to constrained comonotone min-max optimization and comonotone inclusion, achieving the same $O\left(\frac{1}{T}\right)$ convergence rate. This rate is applicable to the broadest set of comonotone inclusion problems yet studied in the literature. Our analyses are based on simple potential function arguments, which might be useful for analyzing other accelerated algorithms.'
volume: 235
URL: https://proceedings.mlr.press/v235/cai24f.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cai24f/cai24f.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cai24f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yang
family: Cai
- given: Argyris
family: Oikonomou
- given: Weiqiang
family: Zheng
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5312-5347
id: cai24f
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5312
lastpage: 5347
published: 2024-07-08 00:00:00 +0000
- title: 'LoCoCo: Dropping In Convolutions for Long Context Compression'
abstract: 'This paper tackles the memory hurdle of of processing long context sequences in Large Language Models (LLMs), by presenting a novel approach, Dropping In Convolutions for **Lo**ng **Co**ntext **Co**mpression (**LoCoCo**). LoCoCo employs only a fixed-size Key-Value (KV) cache, and can enhance efficiency in both inference and fine-tuning stages. Diverging from prior methods that selectively drop KV pairs based on heuristics, LoCoCo leverages a data-driven adaptive fusion technique, blending previous KV pairs with incoming tokens to minimize the loss of contextual information and ensure accurate attention modeling. This token integration is achieved through injecting one-dimensional convolutional kernels that dynamically calculate mixing weights for each KV cache slot. Designed for broad compatibility with existing LLM frameworks, LoCoCo allows for straightforward "drop-in" integration without needing architectural modifications, while incurring minimal tuning overhead. Experiments demonstrate that LoCoCo maintains consistently outstanding performance across various context lengths and can achieve a high context compression rate during both inference and fine-tuning phases. During inference, we successfully compressed up to $3482$ tokens into a $128$-size KV cache, while retaining comparable performance to the full sequence - an accuracy improvement of up to $0.2791$ compared to baselines at the same cache size. During post-training tuning, we also effectively extended the context length from 4K to 32K using a KV cache of fixed size 512, achieving performance similar to fine-tuning with entire sequences.'
volume: 235
URL: https://proceedings.mlr.press/v235/cai24g.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cai24g/cai24g.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cai24g.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ruisi
family: Cai
- given: Yuandong
family: Tian
- given: Zhangyang
family: Wang
- given: Beidi
family: Chen
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5348-5359
id: cai24g
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5348
lastpage: 5359
published: 2024-07-08 00:00:00 +0000
- title: 'On Gradient-like Explanation under a Black-box Setting: When Black-box Explanations Become as Good as White-box'
abstract: 'Attribution methods shed light on the explainability of data-driven approaches such as deep learning models by uncovering the most influential features in a to-be-explained decision. While determining feature attributions via gradients delivers promising results, the internal access required for acquiring gradients can be impractical under safety concerns, thus limiting the applicability of gradient-based approaches. In response to such limited flexibility, this paper presents GEEX (gradient-estimation-based explanation), a method that produces gradient-like explanations through only query-level access. The proposed approach holds a set of fundamental properties for attribution methods, which are mathematically rigorously proved, ensuring the quality of its explanations. In addition to the theoretical analysis, with a focus on image data, the experimental results empirically demonstrate the superiority of the proposed method over state-of-the-art black-box methods and its competitive performance compared to methods with full access.'
volume: 235
URL: https://proceedings.mlr.press/v235/cai24h.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cai24h/cai24h.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cai24h.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yi
family: Cai
- given: Gerhard
family: Wunder
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5360-5382
id: cai24h
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5360
lastpage: 5382
published: 2024-07-08 00:00:00 +0000
- title: 'Sample-specific Masks for Visual Reprogramming-based Prompting'
abstract: '*Visual reprogramming* (VR) is a prompting technique that aims to re-purpose a pre-trained model (e.g., a classifier on ImageNet) to target tasks (e.g., medical data prediction) by learning a *small-scale pattern* added into input images instead of tuning considerable parameters within the model. The location of the pattern within input samples is usually determined by a pre-defined mask *shared across all samples*. In this paper, we show that the shared mask potentially limits VR’s generalization and increases its approximation error due to the lack of sample-level adaptation. Motivated by this finding, we design a new framework for VR called *sample-specific multi-channel masks* (SMM). Specifically, SMM employs a lightweight ConvNet and patch-wise interpolation to generate sample-specific three-channel masks instead of a shared and pre-defined mask. Since we generate different masks for individual samples, SMM is theoretically shown to reduce approximation error for the target tasks compared with existing state-of-the-art VR methods. We also empirically demonstrate its performance gain on both ResNet and ViT. The success of SMM further highlights the broader applicability of VR in leveraging the latent knowledge of pre-trained models for various target tasks. Our code is available at https://github.com/tmlr-group/SMM.'
volume: 235
URL: https://proceedings.mlr.press/v235/cai24i.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cai24i/cai24i.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cai24i.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Chengyi
family: Cai
- given: Zesheng
family: Ye
- given: Lei
family: Feng
- given: Jianzhong
family: Qi
- given: Feng
family: Liu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5383-5408
id: cai24i
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5383
lastpage: 5408
published: 2024-07-08 00:00:00 +0000
- title: 'Human Alignment of Large Language Models through Online Preference Optimisation'
abstract: 'Ensuring alignment of language model’s outputs with human preferences is critical to guarantee a useful, safe, and pleasant user experience. Thus, human alignment has been extensively studied recently and several methods such as Reinforcement Learning from Human Feedback (RLHF), Direct Policy Optimisation (DPO) and Sequence Likelihood Calibration (SLiC) have emerged. In this paper, our contribution is two-fold. First, we show the equivalence between two recent alignment methods, namely Identity Policy Optimisation (IPO) and Nash Mirror Descent (Nash-MD). Second, we introduce a generalisation of IPO, named IPO-MD, that leverages the regularised sampling approach proposed by Nash-MD. This equivalence may seem surprising at first sight, since IPO is an offline method whereas Nash-MD is an online method using a preference model. However, this equivalence can be proven when we consider the online version of IPO, that is when both generations are sampled by the online policy and annotated by a trained preference model. Optimising the IPO loss with such a stream of data becomes then equivalent to finding the Nash equilibrium of the preference model through self-play. Building on this equivalence, we introduce the IPO-MD algorithm that generates data with a mixture policy (between the online and reference policy) similarly as the general Nash-MD algorithm. We compare online-IPO and IPO-MD to different online versions of existing losses on preference data such as DPO and SLiC on a summarisation task.'
volume: 235
URL: https://proceedings.mlr.press/v235/calandriello24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/calandriello24a/calandriello24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-calandriello24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Daniele
family: Calandriello
- given: Zhaohan Daniel
family: Guo
- given: Remi
family: Munos
- given: Mark
family: Rowland
- given: Yunhao
family: Tang
- given: Bernardo
family: Avila Pires
- given: Pierre Harvey
family: Richemond
- given: Charline
family: Le Lan
- given: Michal
family: Valko
- given: Tianqi
family: Liu
- given: Rishabh
family: Joshi
- given: Zeyu
family: Zheng
- given: Bilal
family: Piot
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5409-5435
id: calandriello24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5409
lastpage: 5435
published: 2024-07-08 00:00:00 +0000
- title: 'Partially Stochastic Infinitely Deep Bayesian Neural Networks'
abstract: 'In this paper, we present Partially Stochastic Infinitely Deep Bayesian Neural Networks, a novel family of architectures that integrates partial stochasticity into the framework of infinitely deep neural networks. Our new class of architectures is designed to improve the computational efficiency of existing architectures at training and inference time. To do this, we leverage the advantages of partial stochasticity in the infinite-depth limit which include the benefits of full stochasticity e.g. robustness, uncertainty quantification, and memory efficiency, whilst improving their limitations around computational complexity. We present a variety of architectural configurations, offering flexibility in network design including different methods for weight partition. We also provide mathematical guarantees on the expressivity of our models by establishing that our network family qualifies as Universal Conditional Distribution Approximators. Lastly, empirical evaluations across multiple tasks show that our proposed architectures achieve better downstream task performance and uncertainty quantification than their counterparts while being significantly more efficient. The code can be found at https://github.com/Sergio20f/part_stoch_inf_deep'
volume: 235
URL: https://proceedings.mlr.press/v235/calvo-ordonez24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/calvo-ordonez24a/calvo-ordonez24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-calvo-ordonez24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sergio
family: Calvo Ordoñez
- given: Matthieu
family: Meunier
- given: Francesco
family: Piatti
- given: Yuantao
family: Shi
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5436-5452
id: calvo-ordonez24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5436
lastpage: 5452
published: 2024-07-08 00:00:00 +0000
- title: 'Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design'
abstract: 'Combining discrete and continuous data is an important capability for generative models. We present Discrete Flow Models (DFMs), a new flow-based model of discrete data that provides the missing link in enabling flow-based generative models to be applied to multimodal continuous and discrete data problems. Our key insight is that the discrete equivalent of continuous space flow matching can be realized using Continuous Time Markov Chains. DFMs benefit from a simple derivation that includes discrete diffusion models as a specific instance while allowing improved performance over existing diffusion-based approaches. We utilize our DFMs method to build a multimodal flow-based modeling framework. We apply this capability to the task of protein co-design, wherein we learn a model for jointly generating protein structure and sequence. Our approach achieves state-of-the-art co-design performance while allowing the same multimodal model to be used for flexible generation of the sequence or structure.'
volume: 235
URL: https://proceedings.mlr.press/v235/campbell24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/campbell24a/campbell24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-campbell24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Andrew
family: Campbell
- given: Jason
family: Yim
- given: Regina
family: Barzilay
- given: Tom
family: Rainforth
- given: Tommi
family: Jaakkola
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5453-5512
id: campbell24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5453
lastpage: 5512
published: 2024-07-08 00:00:00 +0000
- title: 'Mimicking Better by Matching the Approximate Action Distribution'
abstract: 'In this paper, we introduce MAAD, a novel, sample-efficient on-policy algorithm for Imitation Learning from Observations. MAAD utilizes a surrogate reward signal, which can be derived from various sources such as adversarial games, trajectory matching objectives, or optimal transport criteria. To compensate for the non-availability of expert actions, we rely on an inverse dynamics model that infers plausible actions distribution given the expert’s state-state transitions; we regularize the imitator’s policy by aligning it to the inferred action distribution. MAAD leads to significantly improved sample efficiency and stability. We demonstrate its effectiveness in a number of MuJoCo environments, both int the OpenAI Gym and the DeepMind Control Suite. We show that it requires considerable fewer interactions to achieve expert performance, outperforming current state-of-the-art on-policy methods. Remarkably, MAAD often stands out as the sole method capable of attaining expert performance levels, underscoring its simplicity and efficacy.'
volume: 235
URL: https://proceedings.mlr.press/v235/candido-ramos24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/candido-ramos24a/candido-ramos24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-candido-ramos24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Joao
family: Candido Ramos
- given: Lionel
family: Blondé
- given: Naoya
family: Takeishi
- given: Alexandros
family: Kalousis
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5513-5532
id: candido-ramos24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5513
lastpage: 5532
published: 2024-07-08 00:00:00 +0000
- title: 'Graph Positional and Structural Encoder'
abstract: 'Positional and structural encodings (PSE) enable better identifiability of nodes within a graph, rendering them essential tools for empowering modern GNNs, and in particular graph Transformers. However, designing PSEs that work optimally for all graph prediction tasks is a challenging and unsolved problem. Here, we present the Graph Positional and Structural Encoder (GPSE), the first-ever graph encoder designed to capture rich PSE representations for augmenting any GNN. GPSE learns an efficient common latent representation for multiple PSEs, and is highly transferable: The encoder trained on a particular graph dataset can be used effectively on datasets drawn from markedly different distributions and modalities. We show that across a wide range of benchmarks, GPSE-enhanced models can significantly outperform those that employ explicitly computed PSEs, and at least match their performance in others. Our results pave the way for the development of foundational pre-trained graph encoders for extracting positional and structural information, and highlight their potential as a more powerful and efficient alternative to explicitly computed PSEs and existing self-supervised pre-training approaches. Our framework and pre-trained models are publicly available at https://github.com/G-Taxonomy-Workgroup/GPSE. For convenience, GPSE has also been integrated into the PyG library to facilitate downstream applications.'
volume: 235
URL: https://proceedings.mlr.press/v235/canturk24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/canturk24a/canturk24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-canturk24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Semih
family: Cantürk
- given: Renming
family: Liu
- given: Olivier
family: Lapointe-Gagné
- given: Vincent
family: Létourneau
- given: Guy
family: Wolf
- given: Dominique
family: Beaini
- given: Ladislav
family: Rampášek
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5533-5566
id: canturk24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5533
lastpage: 5566
published: 2024-07-08 00:00:00 +0000
- title: 'Successor Features for Efficient Multi-Subject Controlled Text Generation'
abstract: 'While large language models (LLMs) have achieved impressive performance in generating fluent and realistic text, controlling the generated text so that it exhibits properties such as safety, factuality, and non-toxicity remains challenging. Existing decoding-based controllable text generation methods are static in terms of the dimension of control; if the target subject is changed, they require new training. Moreover, it can quickly become prohibitive to concurrently control multiple subjects. To address these challenges, we first show that existing methods can be framed as a reinforcement learning problem, where an action-value function estimates the likelihood of a desired attribute appearing in the generated text. Then, we introduce a novel approach named SF-Gen, which leverages the concept of successor features to decouple the dynamics of LLMs from task-specific rewards. By employing successor features, our method proves to be memory-efficient and computationally efficient for both training and decoding, especially when dealing with multiple target subjects. To the best of our knowledge, our research represents the first application of successor features in text generation. In addition to its computational efficiency, the resultant language produced by our method is comparable to the SOTA (and outperforms baselines) in both control measures as well as language quality, which we demonstrate through a series of experiments in various controllable text generation tasks.'
volume: 235
URL: https://proceedings.mlr.press/v235/cao24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cao24a/cao24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cao24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Meng
family: Cao
- given: Mehdi
family: Fatemi
- given: Jackie Ck
family: Cheung
- given: Samira
family: Shabanian
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5567-5583
id: cao24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5567
lastpage: 5583
published: 2024-07-08 00:00:00 +0000
- title: 'Limited Preference Aided Imitation Learning from Imperfect Demonstrations'
abstract: 'Imitation learning mimics high-quality policies from expert data for sequential decision-making tasks. However, its efficacy is hindered in scenarios where optimal demonstrations are unavailable, and only imperfect demonstrations are present. To address this issue, introducing additional limited human preferences is a suitable approach as it can be obtained in a human-friendly manner, offering a promising way to learn the policy that exceeds the performance of imperfect demonstrations. In this paper, we propose a novel imitation learning (IL) algorithm, **P**reference **A**ided **I**mitation **L**earning from imperfect demonstrations (PAIL). Specifically, PAIL learns a preference reward by querying experts for limited preferences from imperfect demonstrations. This serves two purposes during training: 1) Reweighting imperfect demonstrations with the preference reward for higher quality. 2) Selecting explored trajectories with high cumulative preference rewards to augment imperfect demonstrations. The dataset with continuously improving quality empowers the performance of PAIL to transcend the initial demonstrations. Comprehensive empirical results across a synthetic task and two locomotion benchmarks show that PAIL surpasses baselines by **73.2%** and breaks through the performance bottleneck of imperfect demonstrations.'
volume: 235
URL: https://proceedings.mlr.press/v235/cao24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cao24b/cao24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cao24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xingchen
family: Cao
- given: Fan-Ming
family: Luo
- given: Junyin
family: Ye
- given: Tian
family: Xu
- given: Zhilong
family: Zhang
- given: Yang
family: Yu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5584-5607
id: cao24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5584
lastpage: 5607
published: 2024-07-08 00:00:00 +0000
- title: 'Predictive Dynamic Fusion'
abstract: 'Multimodal fusion is crucial in joint decision-making systems for rendering holistic judgments. Since multimodal data changes in open environments, dynamic fusion has emerged and achieved remarkable progress in numerous applications. However, most existing dynamic multimodal fusion methods lack theoretical guarantees and easily fall into suboptimal problems, yielding unreliability and instability. To address this issue, we propose a Predictive Dynamic Fusion (PDF) framework for multimodal learning. We proceed to reveal the multimodal fusion from a generalization perspective and theoretically derive the predictable Collaborative Belief (Co-Belief) with Mono- and Holo-Confidence, which provably reduces the upper bound of generalization error. Accordingly, we further propose a relative calibration strategy to calibrate the predicted Co-Belief for potential uncertainty. Extensive experiments on multiple benchmarks confirm our superiority. Our code is available at https://github.com/Yinan-Xia/PDF.'
volume: 235
URL: https://proceedings.mlr.press/v235/cao24c.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cao24c/cao24c.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cao24c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Bing
family: Cao
- given: Yinan
family: Xia
- given: Yi
family: Ding
- given: Changqing
family: Zhang
- given: Qinghua
family: Hu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5608-5628
id: cao24c
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5608
lastpage: 5628
published: 2024-07-08 00:00:00 +0000
- title: 'Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection'
abstract: 'Detecting out-of-distribution (OOD) samples is essential when deploying machine learning models in open-world scenarios. Zero-shot OOD detection, requiring no training on in-distribution (ID) data, has been possible with the advent of vision-language models like CLIP. Existing methods build a text-based classifier with only closed-set labels. However, this largely restricts the inherent capability of CLIP to recognize samples from large and open label space. In this paper, we propose to tackle this constraint by leveraging the expert knowledge and reasoning capability of large language models (LLM) to Envision potential Outlier Exposure, termed EOE, without access to any actual OOD data. Owing to better adaptation to open-world scenarios, EOE can be generalized to different tasks, including far, near, and fine-grained OOD detection. Technically, we design (1) LLM prompts based on visual similarity to generate potential outlier class labels specialized for OOD detection, as well as (2) a new score function based on potential outlier penalty to distinguish hard OOD samples effectively. Empirically, EOE achieves state-of-the-art performance across different OOD tasks and can be effectively scaled to the ImageNet-1K dataset. The code is publicly available at: https://github.com/tmlr-group/EOE.'
volume: 235
URL: https://proceedings.mlr.press/v235/cao24d.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cao24d/cao24d.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cao24d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Chentao
family: Cao
- given: Zhun
family: Zhong
- given: Zhanke
family: Zhou
- given: Yang
family: Liu
- given: Tongliang
family: Liu
- given: Bo
family: Han
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5629-5659
id: cao24d
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5629
lastpage: 5659
published: 2024-07-08 00:00:00 +0000
- title: 'Can a Few Decide for Many? The Metric Distortion of Sortition'
abstract: 'Recent works have studied the design of algorithms for selecting representative sortition panels. However, the most central question remains unaddressed: Do these panels reflect the entire population’s opinion? We present a positive answer by adopting the concept of metric distortion from computational social choice, which aims to quantify how much a panel’s decision aligns with the ideal decision of the population when preferences and agents lie on a metric space. We show that uniform selection needs only logarithmically many agents in terms of the number of alternatives to achieve almost optimal distortion. We also show that Fair Greedy Capture, a selection algorithm introduced recently by Ebadian and Micha (2024), matches uniform selection’s guarantees of almost optimal distortion and also achieves constant ex-post distortion, ensuring a “best of both worlds” performance.'
volume: 235
URL: https://proceedings.mlr.press/v235/caragiannis24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/caragiannis24a/caragiannis24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-caragiannis24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ioannis
family: Caragiannis
- given: Evi
family: Micha
- given: Jannik
family: Peters
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5660-5679
id: caragiannis24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5660
lastpage: 5679
published: 2024-07-08 00:00:00 +0000
- title: 'Stealing part of a production language model'
abstract: 'We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI’s ChatGPT or Google’s PaLM-2. Specifically, our attack recovers the embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under $20 USD, our attack extracts the entire projection matrix of OpenAI’s Ada and Babbage language models. We thereby confirm, for the first time, that these black-box models have a hidden dimension of 1024 and 2048, respectively. We also recover the exact hidden dimension size of the GPT-3.5-turbo model, and estimate it would cost under \$2,000 in queries to recover the entire projection matrix. We conclude with potential defenses and mitigations, and discuss the implications of possible future work that could extend our attack.'
volume: 235
URL: https://proceedings.mlr.press/v235/carlini24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/carlini24a/carlini24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-carlini24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Nicholas
family: Carlini
- given: Daniel
family: Paleka
- given: Krishnamurthy Dj
family: Dvijotham
- given: Thomas
family: Steinke
- given: Jonathan
family: Hayase
- given: A. Feder
family: Cooper
- given: Katherine
family: Lee
- given: Matthew
family: Jagielski
- given: Milad
family: Nasr
- given: Arthur
family: Conmy
- given: Eric
family: Wallace
- given: David
family: Rolnick
- given: Florian
family: Tramèr
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5680-5705
id: carlini24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5680
lastpage: 5705
published: 2024-07-08 00:00:00 +0000
- title: 'AI Alignment with Changing and Influenceable Reward Functions'
abstract: 'Existing AI alignment approaches assume that preferences are static, which is unrealistic: our preferences change, and may even be influenced by our interactions with AI systems themselves. To clarify the consequences of incorrectly assuming static preferences, we introduce Dynamic Reward Markov Decision Processes (DR-MDPs), which explicitly model preference changes and the AI’s influence on them. We show that despite its convenience, the static-preference assumption may undermine the soundness of existing alignment techniques, leading them to implicitly reward AI systems for influencing user preferences in ways users may not truly want. We then explore potential solutions. First, we offer a unifying perspective on how an agent’s optimization horizon may partially help reduce undesirable AI influence. Then, we formalize different notions of AI alignment that account for preference change from the outset. Comparing the strengths and limitations of 8 such notions of alignment, we find that they all either err towards causing undesirable AI influence, or are overly risk-averse, suggesting that a straightforward solution to the problems of changing preferences may not exist. As there is no avoiding grappling with changing preferences in real-world settings, this makes it all the more important to handle these issues with care, balancing risks and capabilities. We hope our work can provide conceptual clarity and constitute a first step towards AI alignment practices which explicitly account for (and contend with) the changing and influenceable nature of human preferences.'
volume: 235
URL: https://proceedings.mlr.press/v235/carroll24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/carroll24a/carroll24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-carroll24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Micah
family: Carroll
- given: Davis
family: Foote
- given: Anand
family: Siththaranjan
- given: Stuart
family: Russell
- given: Anca
family: Dragan
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5706-5756
id: carroll24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5706
lastpage: 5756
published: 2024-07-08 00:00:00 +0000
- title: 'Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback'
abstract: 'In many real-world applications, it is hard to provide a reward signal in each step of a Reinforcement Learning (RL) process and more natural to give feedback when an episode ends. To this end, we study the recently proposed model of RL with Aggregate Bandit Feedback (RL-ABF), where the agent only observes the sum of rewards at the end of an episode instead of each reward individually. Prior work studied RL-ABF only in tabular settings, where the number of states is assumed to be small. In this paper, we extend ABF to linear function approximation and develop two efficient algorithms with near-optimal regret guarantees: a value-based optimistic algorithm built on a new randomization technique with a Q-functions ensemble, and a policy optimization algorithm that uses a novel hedging scheme over the ensemble.'
volume: 235
URL: https://proceedings.mlr.press/v235/cassel24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cassel24a/cassel24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cassel24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Asaf
family: Cassel
- given: Haipeng
family: Luo
- given: Aviv
family: Rosenberg
- given: Dmitry
family: Sotnikov
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5757-5791
id: cassel24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5757
lastpage: 5791
published: 2024-07-08 00:00:00 +0000
- title: 'Online Learning under Budget and ROI Constraints via Weak Adaptivity'
abstract: 'We study online learning problems in which a decision maker has to make a sequence of costly decisions, with the goal of maximizing their expected reward while adhering to budget and return-on-investment (ROI) constraints. Existing primal-dual algorithms designed for constrained online learning problems under adversarial inputs rely on two fundamental assumptions. First, the decision maker must know beforehand the value of parameters related to the degree of strict feasibility of the problem (i.e. Slater parameters). Second, a strictly feasible solution to the offline optimization problem must exist at each round. Both requirements are unrealistic for practical applications such as bidding in online ad auctions. In this paper, we show how such assumptions can be circumvented by endowing standard primal-dual templates with *weakly adaptive* regret minimizers. This results in a “dual-balancing” framework which ensures that dual variables stay sufficiently small, even in the absence of knowledge about Slater’s parameter. We prove the first *best-of-both-worlds* no-regret guarantees which hold in absence of the two aforementioned assumptions, under stochastic and adversarial inputs. Finally, we show how to instantiate the framework to optimally bid in various mechanisms of practical relevance, such as first- and second-price auctions.'
volume: 235
URL: https://proceedings.mlr.press/v235/castiglioni24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/castiglioni24a/castiglioni24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-castiglioni24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Matteo
family: Castiglioni
- given: Andrea
family: Celli
- given: Christian
family: Kroer
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5792-5816
id: castiglioni24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5792
lastpage: 5816
published: 2024-07-08 00:00:00 +0000
- title: 'How Smooth Is Attention?'
abstract: 'Self-attention and masked self-attention are at the heart of Transformers’ outstanding success. Still, our mathematical understanding of attention, in particular of its Lipschitz properties — which are key when it comes to analyzing robustness and expressive power — is incomplete. We provide a detailed study of the Lipschitz constant of self-attention in several practical scenarios, discussing the impact of the sequence length $n$ and layer normalization on the local Lipschitz constant of both unmasked and masked self-attention. In particular, we show that for inputs of length $n$ in any compact set, the Lipschitz constant of self-attention is bounded by $\sqrt{n}$ up to a constant factor and that this bound is tight for reasonable sequence lengths. When the sequence length $n$ is too large for the previous bound to be tight, which we refer to as the mean-field regime, we provide an upper bound and a matching lower bound which are independent of $n$. Our mean-field framework for masked self-attention is novel and of independent interest. Our experiments on pretrained and randomly initialized BERT and GPT-2 support our theoretical findings.'
volume: 235
URL: https://proceedings.mlr.press/v235/castin24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/castin24a/castin24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-castin24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Valérie
family: Castin
- given: Pierre
family: Ablin
- given: Gabriel
family: Peyré
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5817-5840
id: castin24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5817
lastpage: 5840
published: 2024-07-08 00:00:00 +0000
- title: 'Hierarchical Integral Probability Metrics: A distance on random probability measures with low sample complexity'
abstract: 'Random probabilities are a key component to many nonparametric methods in Statistics and Machine Learning. To quantify comparisons between different laws of random probabilities several works are starting to use the elegant Wasserstein over Wasserstein distance. In this paper we prove that the infinite dimensionality of the space of probabilities drastically deteriorates its sample complexity, which is slower than any polynomial rate in the sample size. We propose a new distance that preserves many desirable properties of the former while achieving a parametric rate of convergence. In particular, our distance 1) metrizes weak convergence; 2) can be estimated numerically through samples with low complexity; 3) can be bounded analytically from above and below. The main ingredient are integral probability metrics, which lead to the name *hierarchical IPM*.'
volume: 235
URL: https://proceedings.mlr.press/v235/catalano24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/catalano24a/catalano24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-catalano24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Marta
family: Catalano
- given: Hugo
family: Lavenant
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5841-5861
id: catalano24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5841
lastpage: 5861
published: 2024-07-08 00:00:00 +0000
- title: 'On the Implicit Bias of Adam'
abstract: 'In previous literature, backward error analysis was used to find ordinary differential equations (ODEs) approximating the gradient descent trajectory. It was found that finite step sizes implicitly regularize solutions because terms appearing in the ODEs penalize the two-norm of the loss gradients. We prove that the existence of similar implicit regularization in RMSProp and Adam depends on their hyperparameters and the training stage, but with a different "norm" involved: the corresponding ODE terms either penalize the (perturbed) one-norm of the loss gradients or, conversely, impede its reduction (the latter case being typical). We also conduct numerical experiments and discuss how the proven facts can influence generalization.'
volume: 235
URL: https://proceedings.mlr.press/v235/cattaneo24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cattaneo24a/cattaneo24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cattaneo24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Matias D.
family: Cattaneo
- given: Jason Matthew
family: Klusowski
- given: Boris
family: Shigida
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5862-5906
id: cattaneo24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5862
lastpage: 5906
published: 2024-07-08 00:00:00 +0000
- title: 'Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts'
abstract: 'Reinforcement learning (RL) is a powerful approach for acquiring a good-performing policy. However, learning diverse skills is challenging in RL due to the commonly used Gaussian policy parameterization. We propose Diverse Skill Learning (Di-SkilL), an RL method for learning diverse skills using Mixture of Experts, where each expert formalizes a skill as a contextual motion primitive. Di-SkilL optimizes each expert and its associate context distribution to a maximum entropy objective that incentivizes learning diverse skills in similar contexts. The per-expert context distribution enables automatic curricula learning, allowing each expert to focus on its best-performing sub-region of the context space. To overcome hard discontinuities and multi-modalities without any prior knowledge of the environment’s unknown context probability space, we leverage energy-based models to represent the per-expert context distributions and demonstrate how we can efficiently train them using the standard policy gradient objective. We show on challenging robot simulation tasks that Di-SkilL can learn diverse and performant skills.'
volume: 235
URL: https://proceedings.mlr.press/v235/celik24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/celik24a/celik24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-celik24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Onur
family: Celik
- given: Aleksandar
family: Taranovic
- given: Gerhard
family: Neumann
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5907-5933
id: celik24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5907
lastpage: 5933
published: 2024-07-08 00:00:00 +0000
- title: 'Centralized Selection with Preferences in the Presence of Biases'
abstract: 'This paper considers the scenario in which there are multiple institutions, each with a limited capacity for candidates, and candidates, each with preferences over the institutions. A central entity evaluates the utility of each candidate to the institutions, and the goal is to select candidates for each institution in a way that maximizes utility while also considering the candidates’ preferences. The paper focuses on the setting in which candidates are divided into multiple groups and the observed utilities of candidates in some groups are biased–systematically lower than their true utilities. The first result is that, in these biased settings, prior algorithms can lead to selections with sub-optimal true utility and significant discrepancies in the fraction of candidates from each group that get their preferred choices. Subsequently, an algorithm is presented along with proof that it produces selections that achieve near-optimal group fairness with respect to preferences while also nearly maximizing the true utility under distributional assumptions. Further, extensive empirical validation of these results in real-world and synthetic settings, in which the distributional assumptions may not hold, are presented.'
volume: 235
URL: https://proceedings.mlr.press/v235/celis24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/celis24a/celis24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-celis24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: L. Elisa
family: Celis
- given: Amit
family: Kumar
- given: Nisheeth K.
family: Vishnoi
- given: Andrew
family: Xu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5934-5981
id: celis24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5934
lastpage: 5981
published: 2024-07-08 00:00:00 +0000
- title: 'Using Left and Right Brains Together: Towards Vision and Language Planning'
abstract: 'Large Language Models (LLMs) and Large Multi-modality Models (LMMs) have demonstrated remarkable decision masking capabilities on a variety of tasks. However, they inherently operate planning within the language space, lacking the vision and spatial imagination ability. In contrast, humans utilize both left and right hemispheres of the brain for language and visual planning during the thinking process. Therefore, we introduce a novel vision-language planning framework in this work to perform concurrent visual and language planning for tasks with inputs of any form. Our framework incorporates visual planning to capture intricate environmental details, while language planning enhances the logical coherence of the overall system. We evaluate the effectiveness of our framework across vision-language tasks, vision-only tasks, and language-only tasks. The results demonstrate the superior performance of our approach, indicating that the integration of visual and language planning yields better contextually aware task execution.'
volume: 235
URL: https://proceedings.mlr.press/v235/cen24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cen24a/cen24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cen24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jun
family: Cen
- given: Chenfei
family: Wu
- given: Xiao
family: Liu
- given: Shengming
family: Yin
- given: Yixuan
family: Pei
- given: Jinglong
family: Yang
- given: Qifeng
family: Chen
- given: Nan
family: Duan
- given: Jianguo
family: Zhang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 5982-6001
id: cen24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 5982
lastpage: 6001
published: 2024-07-08 00:00:00 +0000
- title: 'Feasibility Consistent Representation Learning for Safe Reinforcement Learning'
abstract: 'In the field of safe reinforcement learning (RL), finding a balance between satisfying safety constraints and optimizing reward performance presents a significant challenge. A key obstacle in this endeavor is the estimation of safety constraints, which is typically more difficult than estimating a reward metric due to the sparse nature of the constraint signals. To address this issue, we introduce a novel framework named Feasibility Consistent Safe Reinforcement Learning (FCSRL). This framework combines representation learning with feasibility-oriented objectives to identify and extract safety-related information from the raw state for safe RL. Leveraging self-supervised learning techniques and a more learnable safety metric, our approach enhances the policy learning and constraint estimation. Empirical evaluations across a range of vector-state and image-based tasks demonstrate that our method is capable of learning a better safety-aware embedding and achieving superior performance than previous representation learning baselines.'
volume: 235
URL: https://proceedings.mlr.press/v235/cen24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cen24b/cen24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cen24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zhepeng
family: Cen
- given: Yihang
family: Yao
- given: Zuxin
family: Liu
- given: Ding
family: Zhao
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6002-6019
id: cen24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6002
lastpage: 6019
published: 2024-07-08 00:00:00 +0000
- title: 'Simple Ingredients for Offline Reinforcement Learning'
abstract: 'Offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task. Yet, by leveraging a novel testbed (MOOD) in which trajectories come from heterogeneous sources, we show that existing methods struggle with diverse data: their performance considerably deteriorates as data collected for related but different tasks is simply added to the offline buffer. In light of this finding, we conduct a large empirical study where we formulate and test several hypotheses to explain this failure. Surprisingly, we find that targeted scale, more than algorithmic considerations, is the key factor influencing performance. We show that simple methods like AWAC and IQL with increased policy size overcome the paradoxical failure modes from the inclusion of additional data in MOOD, and notably outperform prior state-of-the-art algorithms on the canonical D4RL benchmark.'
volume: 235
URL: https://proceedings.mlr.press/v235/cetin24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cetin24a/cetin24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cetin24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Edoardo
family: Cetin
- given: Andrea
family: Tirinzoni
- given: Matteo
family: Pirotta
- given: Alessandro
family: Lazaric
- given: Yann
family: Ollivier
- given: Ahmed
family: Touati
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6020-6047
id: cetin24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6020
lastpage: 6047
published: 2024-07-08 00:00:00 +0000
- title: 'Regularizing with Pseudo-Negatives for Continual Self-Supervised Learning'
abstract: 'We introduce a novel Pseudo-Negative Regularization (PNR) framework for effective continual self-supervised learning (CSSL). Our PNR leverages pseudo-negatives obtained through model-based augmentation in a way that newly learned representations may not contradict what has been learned in the past. Specifically, for the InfoNCE-based contrastive learning methods, we define symmetric pseudo-negatives obtained from current and previous models and use them in both main and regularization loss terms. Furthermore, we extend this idea to non-contrastive learning methods which do not inherently rely on negatives. For these methods, a pseudo-negative is defined as the output from the previous model for a differently augmented version of the anchor sample and is asymmetrically applied to the regularization term. Extensive experimental results demonstrate that our PNR framework achieves state-of-the-art performance in representation learning during CSSL by effectively balancing the trade-off between plasticity and stability.'
volume: 235
URL: https://proceedings.mlr.press/v235/cha24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cha24a/cha24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cha24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sungmin
family: Cha
- given: Kyunghyun
family: Cho
- given: Taesup
family: Moon
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6048-6065
id: cha24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6048
lastpage: 6065
published: 2024-07-08 00:00:00 +0000
- title: 'Auditing Private Prediction'
abstract: 'Differential privacy (DP) offers a theoretical upper bound on the potential privacy leakage of an algorithm, while empirical auditing establishes a practical lower bound. Auditing techniques exist for DP training algorithms. However machine learning can also be made private at inference. We propose the first framework for auditing private prediction where we instantiate adversaries with varying poisoning and query capabilities. This enables us to study the privacy leakage of four private prediction algorithms: PATE (Papernot et al., 2016), CaPC (Choquette-Choo et al., 2020), PromptPATE (Duan et al., 2023), and Private-kNN (Zhu et al., 2020). To conduct our audit, we introduce novel techniques to empirically evaluate privacy leakage in terms of Renyi DP. Our experiments show that (i) the privacy analysis of private prediction can be improved, (ii) algorithms which are easier to poison lead to much higher privacy leakage, and (iii) the privacy leakage is significantly lower for adversaries without query control than those with full control.'
volume: 235
URL: https://proceedings.mlr.press/v235/chadha24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chadha24a/chadha24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chadha24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Karan
family: Chadha
- given: Matthew
family: Jagielski
- given: Nicolas
family: Papernot
- given: Christopher A.
family: Choquette-Choo
- given: Milad
family: Nasr
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6066-6092
id: chadha24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6066
lastpage: 6092
published: 2024-07-08 00:00:00 +0000
- title: 'Position: On the Possibilities of AI-Generated Text Detection'
abstract: 'Our study addresses the challenge of distinguishing human-written text from Large Language Model (LLM) outputs. We provide evidence that this differentiation is consistently feasible, except when human and machine text distributions are indistinguishable across their entire support. Employing information theory, we show that while detecting machine-generated text becomes harder as it nears human quality, it remains possible with adequate text data. We introduce guidelines on the required text data quantity, either through sample size or sequence length, for reliable AI text detection, through derivations of sample complexity bounds. This research paves the way for advanced detection methods. Our comprehensive empirical tests, conducted across various datasets (Xsum, Squad, IMDb, and Kaggle FakeNews) and with several state-of-the-art text generators (GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, Llama-2-70B-Chat-HF), assess the viability of enhanced detection methods against detectors like RoBERTa-Large/Base-Detector and GPTZero, with increasing sample sizes and sequence lengths. Our findings align with OpenAI’s empirical data related to sequence length, marking the first theoretical substantiation for these observations.'
volume: 235
URL: https://proceedings.mlr.press/v235/chakraborty24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chakraborty24a/chakraborty24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chakraborty24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Souradip
family: Chakraborty
- given: Amrit
family: Bedi
- given: Sicheng
family: Zhu
- given: Bang
family: An
- given: Dinesh
family: Manocha
- given: Furong
family: Huang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6093-6115
id: chakraborty24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6093
lastpage: 6115
published: 2024-07-08 00:00:00 +0000
- title: 'MaxMin-RLHF: Alignment with Diverse Human Preferences'
abstract: 'Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data. However, the single reward model overlooks the rich diversity of human preferences inherent in data collected from multiple users. In this work, we first derive an impossibility result of alignment with single reward RLHF, thereby highlighting its insufficiency in representing diverse human preferences. Next, we propose to learn a mixture of reward models via an expectation-maximization algorithm and solve a MaxMin alignment objective inspired by the Egalitarian principle in social choice theory to better honor diverse human preferences. We present comprehensive experimental results on small-scale (GPT-2) and large-scale language (with Tulu2-7B)) and show the efficacy of the proposed approach in the presence of diversity among human preferences. We remark that our findings in this work are not only limited to language models but also extend to reinforcement learning in general.'
volume: 235
URL: https://proceedings.mlr.press/v235/chakraborty24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chakraborty24b/chakraborty24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chakraborty24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Souradip
family: Chakraborty
- given: Jiahao
family: Qiu
- given: Hui
family: Yuan
- given: Alec
family: Koppel
- given: Dinesh
family: Manocha
- given: Furong
family: Huang
- given: Amrit
family: Bedi
- given: Mengdi
family: Wang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6116-6135
id: chakraborty24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6116
lastpage: 6135
published: 2024-07-08 00:00:00 +0000
- title: 'Dense Reward for Free in Reinforcement Learning from Human Feedback'
abstract: 'Reinforcement Learning from Human Feedback (RLHF) has been credited as the key advance that has allowed Large Language Models (LLMs) to effectively follow instructions and produce useful assistance. Classically, this involves generating completions from the LLM in response to a query before using a separate reward model to assign a score to the full completion. As an auto-regressive process, the LLM has to take many “actions” (selecting individual tokens) and only receives a single, sparse reward at the end of an episode, a setup that is known to be difficult to optimise in traditional reinforcement learning. In this work we leverage the fact that the reward model contains more information than just its scalar output, in particular, it calculates an attention map over tokens as part of the transformer architecture. We use these attention weights to redistribute the reward along the whole completion, effectively densifying the signal and highlighting the most important tokens, all without incurring extra computational cost or requiring any additional modelling. We demonstrate that, theoretically, this approach is equivalent to potential-based reward shaping, ensuring that the optimal policy remains unchanged. Empirically, we show that it stabilises training, accelerates the rate of learning, and, in practical cases, may lead to better local optima.'
volume: 235
URL: https://proceedings.mlr.press/v235/chan24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chan24a/chan24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chan24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alex James
family: Chan
- given: Hao
family: Sun
- given: Samuel
family: Holt
- given: Mihaela
family: Van Der Schaar
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6136-6154
id: chan24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6136
lastpage: 6154
published: 2024-07-08 00:00:00 +0000
- title: 'Scribble-Supervised Semantic Segmentation with Prototype-based Feature Augmentation'
abstract: 'Scribble-supervised semantic segmentation presents a cost-effective training method that utilizes annotations generated through scribbling. It is valued in attaining high performance while minimizing annotation costs, which has made it highly regarded among researchers. Scribble supervision propagates information from labeled pixels to the surrounding unlabeled pixels, enabling semantic segmentation for the entire image. However, existing methods often ignore the features of classified pixels during feature propagation. To address these limitations, this paper proposes a prototype-based feature augmentation method that leverages feature prototypes to augment scribble supervision. Experimental results demonstrate that our approach achieves state-of-the-art performance on the PASCAL VOC 2012 dataset in scribble-supervised semantic segmentation tasks. The code is available at https://github.com/TranquilChan/PFA.'
volume: 235
URL: https://proceedings.mlr.press/v235/chan24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chan24b/chan24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chan24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Guiyang
family: Chan
- given: Pengcheng
family: Zhang
- given: Hai
family: Dong
- given: Shunhui
family: Ji
- given: Bainian
family: Chen
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6155-6169
id: chan24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6155
lastpage: 6169
published: 2024-07-08 00:00:00 +0000
- title: 'Feature Importance Disparities for Data Bias Investigations'
abstract: 'It is widely held that one cause of downstream bias in classifiers is bias present in the training data. Rectifying such biases may involve context-dependent interventions such as training separate models on subgroups, removing features with bias in the collection process, or even conducting real-world experiments to ascertain sources of bias. Despite the need for such data bias investigations, few automated methods exist to assist practitioners in these efforts. In this paper, we present one such method that given a dataset $X$ consisting of protected and unprotected features, outcomes $y$, and a regressor $h$ that predicts $y$ given $X$, outputs a tuple $(f_j, g)$, with the following property: $g$ corresponds to a subset of the training dataset $(X, y)$, such that the $j^{th}$ feature $f_j$ has much larger (or smaller) *influence* in the subgroup $g$, than on the dataset overall, which we call *feature importance disparity* (FID). We show across $4$ datasets and $4$ common feature importance methods of broad interest to the machine learning community that we can efficiently find subgroups with large FID values even over exponentially large subgroup classes and in practice these groups correspond to subgroups with potentially serious bias issues as measured by standard fairness metrics.'
volume: 235
URL: https://proceedings.mlr.press/v235/chang24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chang24a/chang24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chang24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Peter W
family: Chang
- given: Leor
family: Fishman
- given: Seth
family: Neel
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6170-6201
id: chang24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6170
lastpage: 6201
published: 2024-07-08 00:00:00 +0000
- title: 'Inferring Dynamic Networks from Marginals with Iterative Proportional Fitting'
abstract: 'A common network inference problem, arising from real-world data constraints, is how to infer a dynamic network from its time-aggregated adjacency matrix and time-varying marginals (i.e., row and column sums). Prior approaches to this problem have repurposed the classic iterative proportional fitting (IPF) procedure, also known as Sinkhorn’s algorithm, with promising empirical results. However, the statistical foundation for using IPF has not been well understood: under what settings does IPF provide principled estimation of a dynamic network from its marginals, and how well does it estimate the network? In this work, we establish such a setting, by identifying a generative network model whose maximum likelihood estimates are recovered by IPF. Our model both reveals implicit assumptions on the use of IPF in such settings and enables new analyses, such as structure-dependent error bounds on IPF’s parameter estimates. When IPF fails to converge on sparse network data, we introduce a principled algorithm that guarantees IPF converges under minimal changes to the network structure. Finally, we conduct experiments with synthetic and real-world data, which demonstrate the practical value of our theoretical and algorithmic contributions.'
volume: 235
URL: https://proceedings.mlr.press/v235/chang24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chang24b/chang24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chang24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Serina
family: Chang
- given: Frederic
family: Koehler
- given: Zhaonan
family: Qu
- given: Jure
family: Leskovec
- given: Johan
family: Ugander
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6202-6252
id: chang24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6202
lastpage: 6252
published: 2024-07-08 00:00:00 +0000
- title: 'LaMAGIC: Language-Model-based Topology Generation for Analog Integrated Circuits'
abstract: 'In the realm of electronic and electrical engineering, automation of analog circuit is increasingly vital given the complexity and customized requirements of modern applications. However, existing methods only develop search-based algorithms that require many simulation iterations to design a custom circuit topology, which is usually a time-consuming process. To this end, we introduce LaMAGIC, a pioneering language model-based topology generation model that leverages supervised finetuning for automated analog circuit design. LaMAGIC can efficiently generate an optimized circuit design from the custom specification in a single pass. Our approach involves a meticulous development and analysis of various input and output formulations for circuit. These formulations can ensure canonical representations of circuits and align with the autoregressive nature of LMs to effectively addressing the challenges of representing analog circuits as graphs. The experimental results show that LaMAGIC achieves a success rate of up to 96% under a strict tolerance of 0.01. We also examine the scalability and adaptability of LaMAGIC, specifically testing its performance on more complex circuits. Our findings reveal the enhanced effectiveness of our adjacency matrix-based circuit formulation with floating-point input, suggesting its suitability for handling intricate circuit designs. This research not only demonstrates the potential of language models in graph generation, but also builds a foundational framework for future explorations in automated analog circuit design.'
volume: 235
URL: https://proceedings.mlr.press/v235/chang24c.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chang24c/chang24c.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chang24c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Chen-Chia
family: Chang
- given: Yikang
family: Shen
- given: Shaoze
family: Fan
- given: Jing
family: Li
- given: Shun
family: Zhang
- given: Ningyuan
family: Cao
- given: Yiran
family: Chen
- given: Xin
family: Zhang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6253-6262
id: chang24c
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6253
lastpage: 6262
published: 2024-07-08 00:00:00 +0000
- title: 'MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion'
abstract: 'In this work, we propose MagicPose, a diffusion-based model for 2D human pose and facial expression retargeting. Specifically, given a reference image, we aim to generate a person’s new images by controlling the poses and facial expressions while keeping the identity unchanged. To this end, we propose a two-stage training strategy to disentangle human motions and appearance (e.g., facial expressions, skin tone, and dressing), consisting of (1) the pre-training of an appearance-control block and (2) learning appearance-disentangled pose control. Our novel design enables robust appearance control over generated human images, including body, facial attributes, and even background. By leveraging the prior knowledge of image diffusion models, MagicPose generalizes well to unseen human identities and complex poses without the need for additional fine-tuning. Moreover, the proposed model is easy to use and can be considered as a plug-in module/extension to Stable Diffusion. The project website is here. The code is available here.'
volume: 235
URL: https://proceedings.mlr.press/v235/chang24d.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chang24d/chang24d.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chang24d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Di
family: Chang
- given: Yichun
family: Shi
- given: Quankai
family: Gao
- given: Hongyi
family: Xu
- given: Jessica
family: Fu
- given: Guoxian
family: Song
- given: Qing
family: Yan
- given: Yizhe
family: Zhu
- given: Xiao
family: Yang
- given: Mohammad
family: Soleymani
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6263-6285
id: chang24d
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6263
lastpage: 6285
published: 2024-07-08 00:00:00 +0000
- title: 'From Biased Selective Labels to Pseudo-Labels: An Expectation-Maximization Framework for Learning from Biased Decisions'
abstract: 'Selective labels occur when label observations are subject to a decision-making process; e.g., diagnoses that depend on the administration of laboratory tests. We study a clinically-inspired selective label problem called disparate censorship, where labeling biases vary across subgroups and unlabeled individuals are imputed as “negative” (i.e., no diagnostic test = no illness). Machine learning models naively trained on such labels could amplify labeling bias. Inspired by causal models of selective labels, we propose Disparate Censorship Expectation-Maximization (DCEM), an algorithm for learning in the presence of disparate censorship. We theoretically analyze how DCEM mitigates the effects of disparate censorship on model performance. We validate DCEM on synthetic data, showing that it improves bias mitigation (area between ROC curves) without sacrificing discriminative performance (AUC) compared to baselines. We achieve similar results in a sepsis classification task using clinical data.'
volume: 235
URL: https://proceedings.mlr.press/v235/chang24e.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chang24e/chang24e.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chang24e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Trenton
family: Chang
- given: Jenna
family: Wiens
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6286-6324
id: chang24e
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6286
lastpage: 6324
published: 2024-07-08 00:00:00 +0000
- title: 'On the Role of Edge Dependency in Graph Generative Models'
abstract: 'We investigate the trade-off between the representation power of graph generative models and model *overlap*, i.e., the degree to which the model generates diverse outputs versus regurgitating its training data. In particular, we delineate a nested hierarchy of graph generative models categorized into three levels of complexity: edge independent, node independent, and arbitrarily dependent models. This hierarchy encapsulates a wide range of prevalent methods. We derive theoretical bounds on the number of triangles and other short-length cycles producible by each level of the hierarchy, finding that more complex dependency structure allows an improved trade-off between representation power and overlap. We provide instances demonstrating the asymptotic optimality of our bounds. Furthermore, we introduce new generative models for each of the three hierarchical levels, leveraging dense subgraph discovery. Our evaluation, conducted on real-world datasets, focuses on assessing the output quality and overlap of our proposed models in comparison to other popular models. Our results indicate that our simple, interpretable models provide competitive baselines to popular generative models. Through this investigation, we offer a structured and robust evaluation scheme, thereby facilitating the development of models capable of generating accurate and edge-diverse graphs.'
volume: 235
URL: https://proceedings.mlr.press/v235/chanpuriya24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chanpuriya24a/chanpuriya24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chanpuriya24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sudhanshu
family: Chanpuriya
- given: Cameron N
family: Musco
- given: Konstantinos
family: Sotiropoulos
- given: Charalampos
family: Tsourakakis
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6325-6345
id: chanpuriya24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6325
lastpage: 6345
published: 2024-07-08 00:00:00 +0000
- title: 'Performance Bounds for Active Binary Testing with Information Maximization'
abstract: 'In many applications like experimental design, group testing, and medical diagnosis, the state of a random variable $Y$ is revealed by successively observing the outcomes of binary tests about $Y$. New tests are selected adaptively based on the history of outcomes observed so far. If the number of states of $Y$ is finite, the process ends when $Y$ can be predicted with a desired level of confidence or all available tests have been used. Finding the strategy that minimizes the expected number of tests needed to predict $Y$ is virtually impossible in most real applications. Therefore, the commonly used strategy is the greedy heuristic of Information Maximization (InfoMax) that selects tests sequentially in order of information gain. Despite its widespread use, existing guarantees on its performance are often vacuous when compared to its empirical efficiency. In this paper, for the first time to the best of our knowledge, we establish tight non-vacuous bounds on InfoMax’s performance. Our analysis is based on the assumption that at any iteration of the greedy strategy, there is always a binary test available whose conditional probability of being ’true’, given the history, is within $\delta$ units of one-half. This assumption is motivated by practical applications where the available set of tests often satisfies this property for modest values of $\delta$, say, ${0.1 \leq \delta \leq 0.4}$. Specifically, we analyze two distinct scenarios: (i) all tests are functions of $Y$, and (ii) test outcomes are corrupted by a binary symmetric channel. For both cases, our bounds guarantee the near-optimal performance of InfoMax for modest $\delta$ values. It requires only a small multiplicative factor of the entropy of $Y$, in terms of the average number of tests needed to make accurate predictions.'
volume: 235
URL: https://proceedings.mlr.press/v235/chattopadhyay24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chattopadhyay24a/chattopadhyay24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chattopadhyay24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Aditya
family: Chattopadhyay
- given: Benjamin David
family: Haeffele
- given: Rene
family: Vidal
- given: Donald
family: Geman
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6346-6371
id: chattopadhyay24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6346
lastpage: 6371
published: 2024-07-08 00:00:00 +0000
- title: 'Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation'
abstract: 'We prove that the combination of a target network and over-parameterized linear function approximation establishes a weaker convergence condition for bootstrapped value estimation in certain cases, even with off-policy data. Our condition is naturally satisfied for expected updates over the entire state-action space or learning with a batch of complete trajectories from episodic Markov decision processes. Notably, using only a target network or an over-parameterized model does not provide such a convergence guarantee. Additionally, we extend our results to learning with truncated trajectories, showing that convergence is achievable for all tasks with minor modifications, akin to value truncation for the final states in trajectories. Our primary result focuses on temporal difference estimation for prediction, providing high-probability value estimation error bounds and empirical analysis on Baird’s counterexample and a Four-room task. Furthermore, we explore the control setting, demonstrating that similar convergence conditions apply to Q-learning.'
volume: 235
URL: https://proceedings.mlr.press/v235/che24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/che24a/che24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-che24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Fengdi
family: Che
- given: Chenjun
family: Xiao
- given: Jincheng
family: Mei
- given: Bo
family: Dai
- given: Ramki
family: Gummadi
- given: Oscar A
family: Ramirez
- given: Christopher K
family: Harris
- given: A. Rupam
family: Mahmood
- given: Dale
family: Schuurmans
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6372-6396
id: che24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6372
lastpage: 6396
published: 2024-07-08 00:00:00 +0000
- title: 'PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-Performer'
abstract: 'Despite the recent advancements in offline RL, no unified algorithm could achieve superior performance across a broad range of tasks. Offline *value function learning*, in particular, struggles with sparse-reward, long-horizon tasks due to the difficulty of solving credit assignment and extrapolation errors that accumulates as the horizon of the task grows. On the other hand, models that can perform well in long-horizon tasks are designed specifically for goal-conditioned tasks, which commonly perform worse than value function learning methods on short-horizon, dense-reward scenarios. To bridge this gap, we propose a hierarchical planner designed for offline RL called PlanDQ. PlanDQ incorporates a diffusion-based planner at the high level, named D-Conductor, which guides the low-level policy through sub-goals. At the low level, we used a Q-learning based approach called the Q-Performer to accomplish these sub-goals. Our experimental results suggest that PlanDQ can achieve superior or competitive performance on D4RL continuous control benchmark tasks as well as AntMaze, Kitchen, and Calvin as long-horizon tasks.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24a/chen24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Chang
family: Chen
- given: Junyeob
family: Baek
- given: Fei
family: Deng
- given: Kenji
family: Kawaguchi
- given: Caglar
family: Gulcehre
- given: Sungjin
family: Ahn
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6397-6412
id: chen24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6397
lastpage: 6412
published: 2024-07-08 00:00:00 +0000
- title: 'How Interpretable Are Interpretable Graph Neural Networks?'
abstract: 'Interpretable graph neural networks (XGNNs ) are widely adopted in various scientific applications involving graph-structured data. Existing XGNNs predominantly adopt the attention-based mechanism to learn edge or node importance for extracting and making predictions with the interpretable subgraph. However, the representational properties and limitations of these methods remain inadequately explored. In this work, we present a theoretical framework that formulates interpretable subgraph learning with the multilinear extension of the subgraph distribution, coined as subgraph multilinear extension (SubMT). Extracting the desired interpretable subgraph requires an accurate approximation of SubMT, yet we find that the existing XGNNs can have a huge gap in fitting SubMT. Consequently, the SubMT approximation failure will lead to the degenerated interpretability of the extracted subgraphs. To mitigate the issue, we design a new XGNN architecture called Graph Multilinear neT (GMT), which is provably more powerful in approximating SubMT. We empirically validate our theoretical findings on a number of graph classification benchmarks. The results demonstrate that GMT outperforms the state-of-the-art up to 10% in terms of both interpretability and generalizability across 12 regular and geometric graph benchmarks.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24b/chen24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yongqiang
family: Chen
- given: Yatao
family: Bian
- given: Bo
family: Han
- given: James
family: Cheng
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6413-6456
id: chen24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6413
lastpage: 6456
published: 2024-07-08 00:00:00 +0000
- title: 'Doubly Robust Causal Effect Estimation under Networked Interference via Targeted Learning'
abstract: 'Causal effect estimation under networked interference is an important but challenging problem. Available parametric methods are limited in their model space, while previous semiparametric methods, e.g., leveraging neural networks to fit only one single nuisance function, may still encounter misspecification problems under networked interference without appropriate assumptions on the data generation process. To mitigate bias stemming from misspecification, we propose a novel doubly robust causal effect estimator under networked interference, by adapting the targeted learning technique to the training of neural networks. Specifically, we generalize the targeted learning technique into the networked interference setting and establish the condition under which an estimator achieves double robustness. Based on the condition, we devise an end-to-end causal effect estimator by transforming the identified theoretical condition into a targeted loss. Moreover, we provide a theoretical analysis of our designed estimator, revealing a faster convergence rate compared to a single nuisance model. Extensive experimental results on two real-world networks with semisynthetic data demonstrate the effectiveness of our proposed estimators.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24c.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24c/chen24c.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Weilin
family: Chen
- given: Ruichu
family: Cai
- given: Zeqin
family: Yang
- given: Jie
family: Qiao
- given: Yuguang
family: Yan
- given: Zijian
family: Li
- given: Zhifeng
family: Hao
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6457-6485
id: chen24c
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6457
lastpage: 6485
published: 2024-07-08 00:00:00 +0000
- title: 'Feature Attribution with Necessity and Sufficiency via Dual-stage Perturbation Test for Causal Explanation'
abstract: 'We investigate the problem of explainability for machine learning models, focusing on Feature Attribution Methods (FAMs) that evaluate feature importance through perturbation tests. Despite their utility, FAMs struggle to distinguish the contributions of different features, when their prediction changes are similar after perturbation. To enhance FAMs’ discriminative power, we introduce Feature Attribution with Necessity and Sufficiency (FANS), which find a neighborhood of the input such that perturbing samples within this neighborhood have a high Probability of being Necessity and Sufficiency (PNS) cause for the change in predictions, and use this PNS as the importance of the feature. Specifically, FANS compute this PNS via a heuristic strategy for estimating the neighborhood and a perturbation test involving two stages (factual and interventional) for counterfactual reasoning. To generate counterfactual samples, we use a resampling-based approach on the observed samples to approximate the required conditional distribution. We demonstrate that FANS outperforms existing attribution methods on six benchmarks. Please refer to the source code via https://github.com/DMIRLAB-Group/FANS.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24d.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24d/chen24d.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xuexin
family: Chen
- given: Ruichu
family: Cai
- given: Zhengting
family: Huang
- given: Yuxuan
family: Zhu
- given: Julien
family: Horwood
- given: Zhifeng
family: Hao
- given: Zijian
family: Li
- given: José Miguel
family: Hernández-Lobato
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6486-6502
id: chen24d
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6486
lastpage: 6502
published: 2024-07-08 00:00:00 +0000
- title: 'InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models'
abstract: 'Large language models (LLMs) are instruction followers but the performance varies under different instructions. It is challenging to create the best instruction, especially for black-box LLMs on which backpropagation is forbidden. Instead of directly optimizing the discrete instruction, we optimize a low-dimensional soft prompt applied to an open-source LLM to generate the instruction for the black-box LLM. In each optimization step of the proposed method InstructZero, a soft prompt is converted into an instruction by the open-source LLM, which is then submitted to the black-box LLM for zero-shot evaluation, whose result is sent to Bayesian optimization to produce new soft prompts improving the zero-shot performance. We evaluate InstructZero on different combinations of open-source LLMs and APIs including Vicuna and ChatGPT. InstructZero outperforms SOTA auto-instruction methods across a variety of downstream tasks.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24e.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24e/chen24e.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lichang
family: Chen
- given: Jiuhai
family: Chen
- given: Tom
family: Goldstein
- given: Heng
family: Huang
- given: Tianyi
family: Zhou
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6503-6518
id: chen24e
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6503
lastpage: 6518
published: 2024-07-08 00:00:00 +0000
- title: 'MaSS: Multi-attribute Selective Suppression for Utility-preserving Data Transformation from an Information-theoretic Perspective'
abstract: 'The growing richness of large-scale datasets has been crucial in driving the rapid advancement and wide adoption of machine learning technologies. The massive collection and usage of data, however, pose an increasing risk for people’s private and sensitive information due to either inadvertent mishandling or malicious exploitation. Besides legislative solutions, many technical approaches have been proposed towards data privacy protection. However, they bear various limitations such as leading to degraded data availability and utility, or relying on heuristics and lacking solid theoretical bases. To overcome these limitations, we propose a formal information-theoretic definition for this utility-preserving privacy protection problem, and design a data-driven learnable data transformation framework that is capable of selectively suppressing sensitive attributes from target datasets while preserving the other useful attributes, regardless of whether or not they are known in advance or explicitly annotated for preservation. We provide rigorous theoretical analyses on the operational bounds for our framework, and carry out comprehensive experimental evaluations using datasets of a variety of modalities, including facial images, voice audio clips, and human activity motion sensor signals. Results demonstrate the effectiveness and generalizability of our method under various configurations on a multitude of tasks. Our source code is available at this URL.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24f.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24f/chen24f.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yizhuo
family: Chen
- given: Chun-Fu
family: Chen
- given: Hsiang
family: Hsu
- given: Shaohan
family: Hu
- given: Marco
family: Pistoia
- given: Tarek F.
family: Abdelzaher
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6519-6538
id: chen24f
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6519
lastpage: 6538
published: 2024-07-08 00:00:00 +0000
- title: 'Policy-conditioned Environment Models are More Generalizable'
abstract: 'In reinforcement learning, it is crucial to have an accurate environment dynamics model to evaluate different policies’ value in downstream tasks like offline policy optimization and policy evaluation. However, the learned model is known to be inaccurate in predictions when evaluating target policies different from data-collection policies. In this work, we found that utilizing policy representation for model learning, called policy-conditioned model (PCM) learning, is useful to mitigate the problem, especially when the offline dataset is collected from diversified behavior policies. The reason beyond that is in this case, PCM becomes a meta-dynamics model that is trained to be aware of and focus on the evaluation policies that on-the-fly adjust the model to be suitable to the evaluation policies’ state-action distribution, thus improving the prediction accuracy. Based on that intuition, we propose an easy-to-implement yet effective algorithm of PCM for accurate model learning. We also give a theoretical analysis and experimental evidence to demonstrate the feasibility of reducing value gaps by adapting the dynamics model under different policies. Experiment results show that PCM outperforms the existing SOTA off-policy evaluation methods in the DOPE benchmark by a large margin, and derives significantly better policies in offline policy selection and model predictive control compared with the standard model learning method.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24g.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24g/chen24g.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24g.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ruifeng
family: Chen
- given: Xiong-Hui
family: Chen
- given: Yihao
family: Sun
- given: Siyuan
family: Xiao
- given: Minhui
family: Li
- given: Yang
family: Yu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6539-6561
id: chen24g
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6539
lastpage: 6561
published: 2024-07-08 00:00:00 +0000
- title: 'MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark'
abstract: 'Multimodal Large Language Models (MLLMs) have gained significant attention recently, showing remarkable potential in artificial general intelligence. However, assessing the utility of MLLMs presents considerable challenges, primarily due to the absence multimodal benchmarks that align with human preferences. Drawing inspiration from the concept of LLM-as-a-Judge within LLMs, this paper introduces a novel benchmark, termed MLLM-as-a-Judge, to assess the ability of MLLMs in assisting judges across diverse modalities, encompassing three distinct tasks: Scoring Evaluation, Pair Comparison, and Batch Ranking. Our study reveals that, while MLLMs demonstrate remarkable human-like discernment in Pair Comparisons, there is a significant divergence from human preferences in Scoring Evaluation and Batch Ranking tasks. Furthermore, a closer examination reveals persistent challenges in the evaluative capacities of LLMs, including diverse biases, hallucinatory responses, and inconsistencies in judgment, even in advanced models such as GPT-4V. These findings emphasize the pressing need for enhancements and further research efforts to be undertaken before regarding MLLMs as fully reliable evaluators. In light of this, we advocate for additional efforts dedicated to supporting the continuous development within the domain of MLLM functioning as judges. The code and dataset are publicly available at our project homepage: https://mllm-judge.github.io/.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24h.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24h/chen24h.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24h.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Dongping
family: Chen
- given: Ruoxi
family: Chen
- given: Shilin
family: Zhang
- given: Yaochen
family: Wang
- given: Yinuo
family: Liu
- given: Huichi
family: Zhou
- given: Qihui
family: Zhang
- given: Yao
family: Wan
- given: Pan
family: Zhou
- given: Lichao
family: Sun
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6562-6595
id: chen24h
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6562
lastpage: 6595
published: 2024-07-08 00:00:00 +0000
- title: 'Premise Order Matters in Reasoning with Large Language Models'
abstract: 'Large language models (LLMs) have accomplished remarkable reasoning performance in various domains. However, in the domain of reasoning tasks, we discover a frailty: LLMs are surprisingly brittle to the ordering of the premises, despite the fact that such ordering does not alter the underlying task. In particular, we observe that LLMs achieve the best performance when the premise order aligns with the context required in intermediate reasoning steps. For example, in deductive reasoning tasks, presenting the premises in the same order as the ground truth proof in the prompt (as opposed to random ordering) drastically increases the model’s accuracy. We first examine the effect of premise ordering on deductive reasoning on a variety of LLMs, and our evaluation shows that even if the model performance is decent on the optimal order, permuting the premise order can cause a performance drop of over 30%. In addition, we release the benchmark R-GSM, based on GSM8K, to examine the ordering effect for mathematical problem-solving, and we again observe a significant drop in accuracy, relative to the original GSM8K benchmark.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24i.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24i/chen24i.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24i.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xinyun
family: Chen
- given: Ryan Andrew
family: Chi
- given: Xuezhi
family: Wang
- given: Denny
family: Zhou
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6596-6620
id: chen24i
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6596
lastpage: 6620
published: 2024-07-08 00:00:00 +0000
- title: 'Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models'
abstract: 'Harnessing the power of human-annotated data through Supervised Fine-Tuning (SFT) is pivotal for advancing Large Language Models (LLMs). In this paper, we delve into the prospect of growing a strong LLM out of a weak one without the need for acquiring additional human-annotated data. We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN), which starts from a supervised fine-tuned model. At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself. More specifically, the LLM generates its own training data from its previous iterations, refining its policy by discerning these self-generated responses from those obtained from human-annotated data. Our method progressively elevates the LLM from a nascent model to a formidable one, unlocking the full potential of human-annotated demonstration data for SFT. Theoretically, we prove that the global optimum to the training objective function of our method is achieved only when the LLM policy aligns with the target data distribution. Empirically, we evaluate our method on several benchmark datasets including the HuggingFace Open LLM Leaderboard, MT-Bench, and datasets from Big-Bench. Our results show that SPIN can significantly improve the LLM’s performance across a variety of benchmarks and even outperform models trained through direct preference optimization (DPO) supplemented with extra GPT-4 preference data. This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24j.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24j/chen24j.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24j.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zixiang
family: Chen
- given: Yihe
family: Deng
- given: Huizhuo
family: Yuan
- given: Kaixuan
family: Ji
- given: Quanquan
family: Gu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6621-6642
id: chen24j
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6621
lastpage: 6642
published: 2024-07-08 00:00:00 +0000
- title: 'Robust Classification via a Single Diffusion Model'
abstract: 'Diffusion models have been applied to improve adversarial robustness of image classifiers by purifying the adversarial noises or generating realistic data for adversarial training. However, diffusion-based purification can be evaded by stronger adaptive attacks while adversarial training does not perform well under unseen threats, exhibiting inevitable limitations of these methods. To better harness the expressive power of diffusion models, this paper proposes Robust Diffusion Classifier (RDC), a generative classifier that is constructed from a pre-trained diffusion model to be adversarially robust. RDC first maximizes the data likelihood of a given input and then predicts the class probabilities of the optimized input using the conditional likelihood estimated by the diffusion model through Bayes’ theorem. To further reduce the computational cost, we propose a new diffusion backbone called multi-head diffusion and develop efficient sampling strategies. As RDC does not require training on particular adversarial attacks, we demonstrate that it is more generalizable to defend against multiple unseen threats. In particular, RDC achieves $75.67%$ robust accuracy against various $\ell_\infty$ norm-bounded adaptive attacks with $\epsilon_\infty=8/255$ on CIFAR-10, surpassing the previous state-of-the-art adversarial training models by $+4.77%$. The results highlight the potential of generative classifiers by employing pre-trained diffusion models for adversarial robustness compared with the commonly studied discriminative classifiers.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24k.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24k/chen24k.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24k.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Huanran
family: Chen
- given: Yinpeng
family: Dong
- given: Zhengyi
family: Wang
- given: Xiao
family: Yang
- given: Chengqi
family: Duan
- given: Hang
family: Su
- given: Jun
family: Zhu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6643-6665
id: chen24k
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6643
lastpage: 6665
published: 2024-07-08 00:00:00 +0000
- title: 'Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective'
abstract: 'Foundation Models (FMs) have demonstrated remarkable insights into the relational dynamics of the world, leading to the crucial question: *how do these models acquire an understanding of world hybrid relations?* Traditional statistical learning, particularly for prediction problems, may overlook the rich and inherently structured information from the data, especially regarding the relationships between objects. We introduce a mathematical model that formalizes relational learning as hypergraph recovery to study pre-training of FMs. In our framework, the world is represented as a hypergraph, with data abstracted as random samples from hyperedges. We theoretically examine the feasibility of a Pre-Trained Model (PTM) to recover this hypergraph and analyze the data efficiency in a minimax near-optimal style. By integrating rich graph theories into the realm of PTMs, our mathematical framework offers powerful tools for an in-depth understanding of pre-training from a unique perspective and can be used under various scenarios. As an example, we extend the framework to entity alignment in multimodal learning.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24l.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24l/chen24l.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24l.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yang
family: Chen
- given: Cong
family: Fang
- given: Zhouchen
family: Lin
- given: Bing
family: Liu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6666-6698
id: chen24l
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6666
lastpage: 6698
published: 2024-07-08 00:00:00 +0000
- title: 'Towards AutoAI: Optimizing a Machine Learning System with Black-box and Differentiable Components'
abstract: '*Machine learning* (ML) models in the real world typically do not exist in isolation. They are usually part of a complex system (e.g., healthcare systems, self-driving cars) containing multiple ML and *black-box* components. The problem of optimizing such systems, which we refer to as *automated AI* (AutoAI), requires us to *jointly* train all ML components together and presents a significant challenge because the number of system parameters is extremely high and the system has no analytical form. To circumvent this, we introduce a novel algorithm called A-BAD-BO which uses each ML component’s local loss as an auxiliary indicator for system performance. A-BAD-BO uses *Bayesian optimization* (BO) to optimize the local loss configuration of a system in a smaller dimensional space and exploits the differentiable structure of ML components to recover optimal system parameters from the optimized configuration. We show A-BAD-BO converges to optimal system parameters by showing that it is *asymptotically no regret*. We use A-BAD-BO to optimize several synthetic and real-world complex systems, including a prompt engineering pipeline for *large language models* containing millions of system parameters. Our results demonstrate that A-BAD-BO yields better system optimality than gradient-driven baselines and is more sample-efficient than pure BO algorithms.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24m.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24m/chen24m.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24m.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zhiliang
family: Chen
- given: Chuan-Sheng
family: Foo
- given: Bryan Kian Hsiang
family: Low
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6699-6727
id: chen24m
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6699
lastpage: 6727
published: 2024-07-08 00:00:00 +0000
- title: 'Probabilistic Forecasting with Stochastic Interpolants and Föllmer Processes'
abstract: 'We propose a framework for probabilistic forecasting of dynamical systems based on generative modeling. Given observations of the system state over time, we formulate the forecasting problem as sampling from the conditional distribution of the future system state given its current state. To this end, we leverage the framework of stochastic interpolants, which facilitates the construction of a generative model between an arbitrary base distribution and the target. We design a fictitious, non-physical stochastic dynamics that takes as initial condition the current system state and produces as output a sample from the target conditional distribution in finite time and without bias. This process therefore maps a point mass centered at the current state onto a probabilistic ensemble of forecasts. We prove that the drift coefficient entering the stochastic differential equation (SDE) achieving this task is non-singular, and that it can be learned efficiently by square loss regression over the time-series data. We show that the drift and the diffusion coefficients of this SDE can be adjusted after training, and that a specific choice that minimizes the impact of the estimation error gives a Föllmer process. We highlight the utility of our approach on several complex, high-dimensional forecasting problems, including stochastically forced Navier-Stokes and video prediction on the KTH and CLEVRER datasets. The code is available at https://github.com/interpolants/forecasting.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24n.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24n/chen24n.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24n.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yifan
family: Chen
- given: Mark
family: Goldstein
- given: Mengjian
family: Hua
- given: Michael Samuel
family: Albergo
- given: Nicholas Matthew
family: Boffi
- given: Eric
family: Vanden-Eijnden
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6728-6756
id: chen24n
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6728
lastpage: 6756
published: 2024-07-08 00:00:00 +0000
- title: 'CogDPM: Diffusion Probabilistic Models via Cognitive Predictive Coding'
abstract: 'Predictive Coding (PC) is a theoretical framework in cognitive science suggesting that the human brain processes cognition through spatiotemporal prediction of visual world. Existing studies have developed spatiotemporal prediction neural networks based on the PC theroy, emulating its two core mechanisms: Correcting predictions from residuals and Hierarchical learning. However, these models do not show the enhancement of prediction skills on real-world forecasting tasks, and ignore the Precision Weighting mechanism of PC theory. Precision weight posits that the brain allocates more attention to signals with lower Precision, contributing to the the cognitive ability of human brains. This work introduces the Cognitive Diffusion Probabilistic Models (CogDPM) which demonstrates the connection between diffusion probabilistic models and PC theory. CogDPM features a precision estimation method based on the hierarchical sampling capabilities of diffusion models, and allocate the guidance with precision weights estimated by the inherent property of diffusion models. We experimentally show that the precision weights is an estimator of model’s predictability on the rigid body and fluid motion dataset. We also apply CogDPM to real-world prediction tasks using the U.K. precipitation and ERA surface wind datasets. Our results demonstrate that CogDPM outperforms both existing domain-specific operational models and general deep prediction models in providing more proficient forecasting.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24o.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24o/chen24o.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24o.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Kaiyuan
family: Chen
- given: Xingzhuo
family: Guo
- given: Yu
family: Zhang
- given: Jianmin
family: Wang
- given: Mingsheng
family: Long
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6757-6775
id: chen24o
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6757
lastpage: 6775
published: 2024-07-08 00:00:00 +0000
- title: 'On Interpolating Experts and Multi-Armed Bandits'
abstract: 'Learning with expert advice and multi-armed bandit are two classic online decision problems which differ on how the information is observed in each round of the game. We study a family of problems interpolating the two. For a vector $\mathbf{m}=(m_1,…,m_K)\in \mathbb N^K$, an instance of $\mathbf m$-MAB indicates that the arms are partitioned into $K$ groups and the $i$-th group contains $m_i$ arms. Once an arm is pulled, the losses of all arms in the same group are observed. We prove tight minimax regret bounds for $\mathbf m$-MAB and design an optimal PAC algorithm for its pure exploration version, $\mathbf m$-BAI, where the goal is to identify the arm with minimum loss with as few rounds as possible. We show that the minimax regret of $\mathbf m$-MAB is $\Theta\left(\sqrt{T\sum_{k=1}^K\log (m_k+1)}\right)$ and the minimum number of pulls for an $(\varepsilon,0.05)$-PAC algorithm of $\mathbf m$-BAI is $\Theta\left(\frac{1}{\varepsilon^2}\cdot \sum_{k=1}^K\log (m_k+1)\right)$. Both our upper bounds and lower bounds for $\mathbf m$-MAB can be extended to a more general setting, namely the bandit with graph feedback, in terms of the *clique cover* and related graph parameters. As consequences, we obtained tight minimax regret bounds for several families of feedback graphs.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24p.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24p/chen24p.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24p.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Houshuang
family: Chen
- given: Yuchen
family: He
- given: Chihao
family: Zhang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6776-6802
id: chen24p
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6776
lastpage: 6802
published: 2024-07-08 00:00:00 +0000
- title: 'Bagged Deep Image Prior for Recovering Images in the Presence of Speckle Noise'
abstract: 'We investigate both the theoretical and algorithmic aspects of likelihood-based methods for recovering a complex-valued signal from multiple sets of measurements, referred to as looks, affected by speckle (multiplicative) noise. Our theoretical contributions include establishing the first existing theoretical upper bound on the Mean Squared Error (MSE) of the maximum likelihood estimator under the deep image prior hypothesis. Our theoretical results capture the dependence of MSE upon the number of parameters in the deep image prior, the number of looks, the signal dimension, and the number of measurements per look. On the algorithmic side, we introduce the concept of bagged Deep Image Priors (Bagged-DIP) and integrate them with projected gradient descent. Furthermore, we show how employing Newton-Schulz algorithm for calculating matrix inverses within the iterations of PGD reduces the computational complexity of the algorithm. We will show that this method achieves the state-of-the-art performance.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24q.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24q/chen24q.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24q.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xi
family: Chen
- given: Zhewen
family: Hou
- given: Christopher
family: Metzler
- given: Arian
family: Maleki
- given: Shirin
family: Jalali
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6803-6832
id: chen24q
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6803
lastpage: 6832
published: 2024-07-08 00:00:00 +0000
- title: 'Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformers'
abstract: 'In-Context Learning (ICL) has been a powerful emergent property of large language models that has attracted increasing attention in recent years. In contrast to regular gradient-based learning, ICL is highly interpretable and does not require parameter updates. In this paper, we show that, for linearized transformer networks, ICL can be made explicit and permanent through the inclusion of bias terms. We mathematically demonstrate the equivalence between a model with ICL demonstration prompts and the same model with the additional bias terms. Our algorithm (ICLCA) allows for exact conversion in an inexpensive manner. Existing methods are not exact and require expensive parameter updates. We demonstrate the efficacy of our approach through experiments that show the exact incorporation of ICL tokens into a linear transformer. We further suggest how our method can be adapted to achieve cheap approximate conversion of ICL tokens, even in regular transformer networks that are not linearized. Our experiments on GPT-2 show that, even though the conversion is only approximate, the model still gains valuable context from the included bias terms.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24r.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24r/chen24r.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24r.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Brian K
family: Chen
- given: Tianyang
family: Hu
- given: Hui
family: Jin
- given: Hwee Kuan
family: Lee
- given: Kenji
family: Kawaguchi
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6833-6846
id: chen24r
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6833
lastpage: 6846
published: 2024-07-08 00:00:00 +0000
- title: 'Accelerated Policy Gradient for s-rectangular Robust MDPs with Large State Spaces'
abstract: 'Robust Markov decision process (robust MDP) is an important machine learning framework to make a reliable policy that is robust to environmental perturbation. Despite empirical success and popularity of policy gradient methods, existing policy gradient methods require at least iteration complexity $\mathcal{O}(\epsilon^{-4})$ to converge to the global optimal solution of s-rectangular robust MDPs with $\epsilon$-accuracy and are limited to deterministic setting with access to exact gradients and small state space that are impractical in many applications. In this work, we propose an accelerated policy gradient algorithm with iteration complexity $\mathcal{O}(\epsilon^{-3}\ln\epsilon^{-1})$ in the deterministic setting using entropy regularization. Furthermore, we extend this algorithm to stochastic setting with access to only stochastic gradients and large state space which achieves the sample complexity $\mathcal{O}(\epsilon^{-7}\ln\epsilon^{-1})$. In the meantime, our algorithms are also the first scalable policy gradient methods to entropy-regularized robust MDPs, which provide an important but underexplored machine learning framework.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24s.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24s/chen24s.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24s.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ziyi
family: Chen
- given: Heng
family: Huang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6847-6880
id: chen24s
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6847
lastpage: 6880
published: 2024-07-08 00:00:00 +0000
- title: 'Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement Learning'
abstract: 'Various acceleration approaches for Policy Gradient (PG) have been analyzed within the realm of Reinforcement Learning (RL). However, the theoretical understanding of the widely used momentum-based acceleration method on PG remains largely open. In response to this gap, we adapt the celebrated Nesterov’s accelerated gradient (NAG) method to policy optimization in RL, termed *Accelerated Policy Gradient* (APG). To demonstrate the potential of APG in achieving fast convergence, we formally prove that with the true gradient and under the softmax policy parametrization, APG converges to an optimal policy at rates: (i) $\tilde{O}(1/t^2)$ with nearly constant step sizes; (ii) $O(e^{-ct})$ with time-varying step sizes. To the best of our knowledge, this is the first characterization of the convergence rates of NAG in the context of RL. Notably, our analysis relies on one interesting finding: Regardless of the parameter initialization, APG ends up entering a locally nearly-concave regime, where APG can significantly benefit from the momentum, within finite iterations. Through numerical validation and experiments on the Atari 2600 benchmarks, we confirm that APG exhibits a $\tilde{O}(1/t^2)$ rate with nearly constant step sizes and a linear convergence rate with time-varying step sizes, significantly improving convergence over the standard PG.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24t.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24t/chen24t.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24t.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yen-Ju
family: Chen
- given: Nai-Chieh
family: Huang
- given: Ching-Pei
family: Lee
- given: Ping-Chun
family: Hsieh
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6881-6949
id: chen24t
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6881
lastpage: 6949
published: 2024-07-08 00:00:00 +0000
- title: 'From Yes-Men to Truth-Tellers: Addressing Sycophancy in Large Language Models with Pinpoint Tuning'
abstract: 'Large Language Models (LLMs) tend to prioritize adherence to user prompts over providing veracious responses, leading to the sycophancy issue. When challenged by users, LLMs tend to admit mistakes and provide inaccurate responses even if they initially provided the correct answer. Recent works propose to employ supervised fine-tuning (SFT) to mitigate the sycophancy issue, while it typically leads to the degeneration of LLMs’ general capability. To address the challenge, we propose a novel supervised pinpoint tuning (SPT), where the region-of-interest modules are tuned for a given objective. Specifically, SPT first reveals and verifies a small percentage ($<$5%) of the basic modules, which significantly affect a particular behavior of LLMs. i.e., sycophancy. Subsequently, SPT merely fine-tunes these identified modules while freezing the rest. To verify the effectiveness of the proposed SPT, we conduct comprehensive experiments, demonstrating that SPT significantly mitigates the sycophancy issue of LLMs (even better than SFT). Moreover, SPT introduces limited or even no side effects on the general capability of LLMs. Our results shed light on how to precisely, effectively, and efficiently explain and improve the targeted ability of LLMs.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24u.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24u/chen24u.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24u.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Wei
family: Chen
- given: Zhen
family: Huang
- given: Liang
family: Xie
- given: Binbin
family: Lin
- given: Houqiang
family: Li
- given: Le
family: Lu
- given: Xinmei
family: Tian
- given: Deng
family: Cai
- given: Yonggang
family: Zhang
- given: Wenxiao
family: Wang
- given: Xu
family: Shen
- given: Jieping
family: Ye
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6950-6972
id: chen24u
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6950
lastpage: 6972
published: 2024-07-08 00:00:00 +0000
- title: 'Improved Communication-Privacy Trade-offs in $L_2$ Mean Estimation under Streaming Differential Privacy'
abstract: 'We study $L_2$ mean estimation under central differential privacy and communication constraints, and address two key challenges: firstly, existing mean estimation schemes that simultaneously handle both constraints are usually optimized for $L_\infty$ geometry and rely on random rotation or Kashin’s representation to adapt to $L_2$ geometry, resulting in suboptimal leading constants in mean square errors (MSEs); secondly, schemes achieving order-optimal communication-privacy trade-offs do not extend seamlessly to streaming differential privacy (DP) settings (e.g., tree aggregation or matrix factorization), rendering them incompatible with DP-FTRL type optimizers. In this work, we tackle these issues by introducing a novel privacy accounting method for the sparsified Gaussian mechanism that incorporates the randomness inherent in sparsification into the DP noise. Unlike previous approaches, our accounting algorithm directly operates in $L_2$ geometry, yielding MSEs that fast converge to those of the uncompressed Gaussian mechanism. Additionally, we extend the sparsification scheme to the matrix factorization framework under streaming DP and provide a precise accountant tailored for DP-FTRL type optimizers. Empirically, our method demonstrates at least a 100x improvement of compression for DP-SGD across various FL tasks.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24v.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24v/chen24v.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24v.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Wei-Ning
family: Chen
- given: Berivan
family: Isik
- given: Peter
family: Kairouz
- given: Albert
family: No
- given: Sewoong
family: Oh
- given: Zheng
family: Xu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6973-6991
id: chen24v
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6973
lastpage: 6991
published: 2024-07-08 00:00:00 +0000
- title: 'Offline Transition Modeling via Contrastive Energy Learning'
abstract: 'Learning a high-quality transition model is of great importance for sequential decision-making tasks, especially in offline settings. Nevertheless, the complex behaviors of transition dynamics in real-world environments pose challenges for the standard forward models because of their inductive bias towards smooth regressors, conflicting with the inherent nature of transitions such as discontinuity or large curvature. In this work, we propose to model the transition probability implicitly through a scalar-value energy function, which enables not only flexible distribution prediction but also capturing complex transition behaviors. The Energy-based Transition Models (ETM) are shown to accurately fit the discontinuous transition functions and better generalize to out-of-distribution transition data. Furthermore, we demonstrate that energy-based transition models improve the evaluation accuracy and significantly outperform other off-policy evaluation methods in DOPE benchmark. Finally, we show that energy-based transition models also benefit reinforcement learning and outperform prior offline RL algorithms in D4RL Gym-Mujoco tasks.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24w.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24w/chen24w.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24w.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ruifeng
family: Chen
- given: Chengxing
family: Jia
- given: Zefang
family: Huang
- given: Tian-Shuo
family: Liu
- given: Xu-Hui
family: Liu
- given: Yang
family: Yu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 6992-7014
id: chen24w
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 6992
lastpage: 7014
published: 2024-07-08 00:00:00 +0000
- title: 'Efficient Pareto Manifold Learning with Low-Rank Structure'
abstract: 'Multi-task learning, which optimizes performance across multiple tasks, is inherently a multi-objective optimization problem. Various algorithms are developed to provide discrete trade-off solutions on the Pareto front. Recently, continuous Pareto front approximations using a linear combination of base networks have emerged as a compelling strategy. However, it suffers from scalability issues when the number of tasks is large. To address this issue, we propose a novel approach that integrates a main network with several low-rank matrices to efficiently learn the Pareto manifold. It significantly reduces the number of parameters and facilitates the extraction of shared features. We also introduce orthogonal regularization to further bolster performance. Extensive experimental results demonstrate that the proposed approach outperforms state-of-the-art baselines, especially on datasets with a large number of tasks.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24x.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24x/chen24x.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24x.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Weiyu
family: Chen
- given: James
family: Kwok
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7015-7032
id: chen24x
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7015
lastpage: 7032
published: 2024-07-08 00:00:00 +0000
- title: 'Toward Adaptive Reasoning in Large Language Models with Thought Rollback'
abstract: 'Large language models (LLMs) have been routinely used to solve various tasks using step-by-step reasoning. However, the structure of intermediate reasoning steps, or *thoughts*, is rigid and unidirectional, such as chains, trees, or acyclic-directed graphs. Consequently, the resulting inflexible and forward-only reasoning may not address challenging tasks and fail when the LLM frequently gives false responses, i.e., hallucinations. This paper proposes a new reasoning framework, called *Thought Rollback* (TR), allowing LLMs to adaptively build thought structure while maintaining effective reasoning toward problem-solving under hallucinations. The core mechanism of TR is *rolling back thoughts*, which allows LLMs to perform error analysis on thoughts, and thus roll back to any previously mistaken thought for revision. Subsequently, by including such trial-and-error in the prompt to guide the LLM, each rollback leads to one more reliable reasoning path. Therefore, starting with a simple prompt without human annotations, LLM with TR adaptively and gradually explores thoughts for a correct solution. Comprehensive experiments on mathematical problems and multi-task reasoning demonstrate the state-of-the-art performance of TR in terms of problem-solving rate and interaction cost. For instance, the solving rate of GPT-4 with TR outperforms the current best by $9%$ on the MATH dataset. The source code is available under the folder *examples/ThoughtRollback* of https://github.com/iQua/llmpebase.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24y.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24y/chen24y.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24y.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sijia
family: Chen
- given: Baochun
family: Li
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7033-7056
id: chen24y
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7033
lastpage: 7056
published: 2024-07-08 00:00:00 +0000
- title: 'Identifiability Matters: Revealing the Hidden Recoverable Condition in Unbiased Learning to Rank'
abstract: 'Unbiased Learning to Rank (ULTR) aims to train unbiased ranking models from biased click logs, by explicitly modeling a generation process for user behavior and fitting click data based on examination hypothesis. Previous research found empirically that the true latent relevance is mostly recoverable through click fitting. However, we demonstrate that this is not always achievable, resulting in a significant reduction in ranking performance. This research investigates the conditions under which relevance can be recovered from click data in the first principle. We initially characterize a ranking model as identifiable if it can recover the true relevance up to a scaling transformation, a criterion sufficient for the pairwise ranking objective. Subsequently, we investigate an equivalent condition for identifiability, articulated as a graph connectivity test problem: the recovery of relevance is feasible if and only if the identifiability graph (IG), derived from the underlying structure of the dataset, is connected. The presence of a disconnected IG may lead to degenerate cases and suboptimal ranking performance. To tackle this challenge, we introduce two methods, namely node intervention and node merging, designed to modify the dataset and restore the connectivity of the IG. Empirical results derived from a simulated dataset and two real-world LTR benchmark datasets not only validate our proposed theory, but also demonstrate the effectiveness of our methods in alleviating data bias when the relevance model is unidentifiable.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24z.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24z/chen24z.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24z.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mouxiang
family: Chen
- given: Chenghao
family: Liu
- given: Zemin
family: Liu
- given: Zhuo
family: Li
- given: Jianling
family: Sun
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7057-7080
id: chen24z
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7057
lastpage: 7080
published: 2024-07-08 00:00:00 +0000
- title: 'High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization'
abstract: 'This paper studies kernel ridge regression in high dimensions under covariate shifts and analyzes the role of importance re-weighting. We first derive the asymptotic expansion of high dimensional kernels under covariate shifts. By a bias-variance decomposition, we theoretically demonstrate that the re-weighting strategy allows for decreasing the variance. For bias, we analyze the regularization of the arbitrary or well-chosen scale, showing that the bias can behave very differently under different regularization scales. In our analysis, the bias and variance can be characterized by the spectral decay of a data-dependent regularized kernel: the original kernel matrix associated with an additional re-weighting matrix, and thus the re-weighting strategy can be regarded as a data-dependent regularization for better understanding. Besides, our analysis provides asymptotic expansion of kernel functions/vectors under covariate shift, which has its own interest.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24aa.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24aa/chen24aa.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24aa.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yihang
family: Chen
- given: Fanghui
family: Liu
- given: Taiji
family: Suzuki
- given: Volkan
family: Cevher
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7081-7102
id: chen24aa
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7081
lastpage: 7102
published: 2024-07-08 00:00:00 +0000
- title: 'DiJiang: Efficient Large Language Models through Compact Kernelization'
abstract: 'In an effort to reduce the computational load of Transformers, research on linear attention has gained significant momentum. However, the improvement strategies for attention mechanisms typically necessitate extensive retraining, which is impractical for large language models with a vast array of parameters. In this paper, we present DiJiang, a novel Frequency Domain Kernelization approach that enables the transformation of a pre-trained vanilla Transformer into a linear complexity model with little training costs. By employing a weighted Quasi-Monte Carlo method for sampling, the proposed approach theoretically offers superior approximation efficiency. To further reduce the training computational complexity, our kernelization is based on Discrete Cosine Transform (DCT) operations. Extensive experiments demonstrate that the proposed method achieves comparable performance to the original Transformer, but with significantly reduced training costs and much faster inference speeds. Our DiJiang-7B achieves comparable performance with LLaMA2-7B on various benchmark while requires only about 1/50 training cost. Code is available at https://github.com/YuchuanTian/DiJiang.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24ab.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24ab/chen24ab.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24ab.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hanting
family: Chen
- given: Liu
family: Zhicheng
- given: Xutao
family: Wang
- given: Yuchuan
family: Tian
- given: Yunhe
family: Wang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7103-7117
id: chen24ab
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7103
lastpage: 7117
published: 2024-07-08 00:00:00 +0000
- title: 'GeoMFormer: A General Architecture for Geometric Molecular Representation Learning'
abstract: 'Molecular modeling, a central topic in quantum mechanics, aims to accurately calculate the properties and simulate the behaviors of molecular systems. The molecular model is governed by physical laws, which impose geometric constraints such as invariance and equivariance to coordinate rotation and translation. While numerous deep learning approaches have been developed to learn molecular representations under these constraints, most of them are built upon heuristic and costly modules. We argue that there is a strong need for a general and flexible framework for learning both invariant and equivariant features. In this work, we introduce a novel Transformer-based molecular model called GeoMFormer to achieve this goal. Using the standard Transformer modules, two separate streams are developed to maintain and learn invariant and equivariant representations. Carefully designed *cross-attention* modules bridge the two streams, allowing information fusion and enhancing geometric modeling in each stream. As a general and flexible architecture, we show that many previous architectures can be viewed as special instantiations of GeoMFormer. Extensive experiments are conducted to demonstrate the power of GeoMFormer. All empirical results show that GeoMFormer achieves strong performance on both invariant and equivariant tasks of different types and scales. Code and models will be made publicly available at https://github.com/c-tl/GeoMFormer.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24ac.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24ac/chen24ac.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24ac.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tianlang
family: Chen
- given: Shengjie
family: Luo
- given: Di
family: He
- given: Shuxin
family: Zheng
- given: Tie-Yan
family: Liu
- given: Liwei
family: Wang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7118-7142
id: chen24ac
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7118
lastpage: 7142
published: 2024-07-08 00:00:00 +0000
- title: 'TENG: Time-Evolving Natural Gradient for Solving PDEs With Deep Neural Nets Toward Machine Precision'
abstract: 'Partial differential equations (PDEs) are instrumental for modeling dynamical systems in science and engineering. The advent of neural networks has initiated a significant shift in tackling these complexities though challenges in accuracy persist, especially for initial value problems. In this paper, we introduce the *Time-Evolving Natural Gradient (TENG)*, generalizing time-dependent variational principles and optimization-based time integration, leveraging natural gradient optimization to obtain high accuracy in neural-network-based PDE solutions. Our comprehensive development includes algorithms like TENG-Euler and its high-order variants, such as TENG-Heun, tailored for enhanced precision and efficiency. TENG’s effectiveness is further validated through its performance, surpassing current leading methods and achieving *machine precision* in step-by-step optimizations across a spectrum of PDEs, including the heat equation, Allen-Cahn equation, and Burgers’ equation.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24ad.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24ad/chen24ad.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24ad.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zhuo
family: Chen
- given: Jacob
family: Mccarran
- given: Esteban
family: Vizcaino
- given: Marin
family: Soljacic
- given: Di
family: Luo
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7143-7162
id: chen24ad
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7143
lastpage: 7162
published: 2024-07-08 00:00:00 +0000
- title: 'EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism'
abstract: 'We present EE-LLM, a framework for large-scale training and inference of early-exit large language models (LLMs). While recent works have shown preliminary evidence for the efficacy of early exiting in accelerating LLM inference, EE-LLM makes a foundational step towards scaling up early-exit LLMs by supporting their training and inference with massive 3D parallelism. Built upon Megatron-LM, EE-LLM implements a variety of algorithmic innovations and performance optimizations tailored to early exiting, including a lightweight method that facilitates backpropagation for the early-exit training objective with pipeline parallelism, techniques of leveraging idle resources in the original pipeline schedule for computation related to early-exit layers, and two approaches of early-exit inference that are compatible with KV caching for autoregressive generation. Our analytical and empirical study shows that EE-LLM achieves great training efficiency with negligible computational overhead compared to standard LLM training, as well as outstanding inference speedup without compromising output quality. To facilitate further research and adoption, we release EE-LLM at https://github.com/pan-x-c/EE-LLM.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24ae.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24ae/chen24ae.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24ae.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yanxi
family: Chen
- given: Xuchen
family: Pan
- given: Yaliang
family: Li
- given: Bolin
family: Ding
- given: Jingren
family: Zhou
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7163-7189
id: chen24ae
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7163
lastpage: 7189
published: 2024-07-08 00:00:00 +0000
- title: 'TimeMIL: Advancing Multivariate Time Series Classification via a Time-aware Multiple Instance Learning'
abstract: 'Deep neural networks, including transformers and convolutional neural networks (CNNs), have significantly improved multivariate time series classification (MTSC). However, these methods often rely on supervised learning, which does not fully account for the sparsity and locality of patterns in time series data (e.g., quantification of diseases-related anomalous points in ECG and abnormal detection in signal). To address this challenge, we formally discuss and reformulate MTSC as a weakly supervised problem, introducing a novel multiple-instance learning (MIL) framework for better localization of patterns of interest and modeling time dependencies within time series. Our novel approach, TimeMIL, formulates the temporal correlation and ordering within a time-aware MIL pooling, leveraging a tokenized transformer with a specialized learnable wavelet positional token. The proposed method surpassed 26 recent state-of-the-art MTSC methods, underscoring the effectiveness of the weakly supervised TimeMIL in MTSC. The code is available https://github.com/xiwenc1/TimeMIL.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24af.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24af/chen24af.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24af.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xiwen
family: Chen
- given: Peijie
family: Qiu
- given: Wenhui
family: Zhu
- given: Huayu
family: Li
- given: Hao
family: Wang
- given: Aristeidis
family: Sotiras
- given: Yalin
family: Wang
- given: Abolfazl
family: Razi
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7190-7206
id: chen24af
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7190
lastpage: 7206
published: 2024-07-08 00:00:00 +0000
- title: 'AegisFL: Efficient and Flexible Privacy-Preserving Byzantine-Robust Cross-silo Federated Learning'
abstract: 'Privacy attacks and poisoning attacks are two of the thorniest problems in federation learning (FL). Homomorphic encryption (HE), which allows certain mathematical operations to be done in the ciphertext state, provides a way to solve these two problems simultaneously. However, existing Paillier-based and CKKS-based privacy-preserving byzantine-robust FL (PBFL) solutions not only suffer from low efficiency but also expose the final model to the server. Additionally, these methods are limited to one robust aggregation algorithm (AGR) and are therefore vulnerable to AGR-tailored poisoning attacks. In this paper, we present AegisFL, an efficient PBLF system that provides the flexibility to change the AGR. We first observe that the core of the existing advanced AGRs is to calculate the inner products, $L_2$ norms and mean values for vectors. Based on this observation, we tailor a packing scheme for PBFL, which fits perfectly with RLWE-based fully homomorphic encryption. Under this packing scheme, the server only needs to perform one ciphertext multiplication to construct any required AGR, while the global model only belongs to honest clients. Finally, we conduct extensive experiments on different datasets and adversary settings, which also confirm the effectiveness and efficiency of our scheme.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24ag.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24ag/chen24ag.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24ag.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Dong
family: Chen
- given: Hongyuan
family: Qu
- given: Guangwu
family: Xu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7207-7219
id: chen24ag
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7207
lastpage: 7219
published: 2024-07-08 00:00:00 +0000
- title: 'MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models'
abstract: 'Multi-agent interactions between Large Language Model (LLM) agents have shown major improvements on diverse reasoning tasks. However, these involve long generations from multiple models across several rounds, making them expensive. Moreover, these multi-agent approaches fail to provide a final, single model for efficient inference. To address this, we introduce MAGDi, a new method for structured distillation of the reasoning interactions between multiple LLMs into smaller LMs. MAGDi teaches smaller models by representing multi-agent interactions as graphs, augmenting a base student model with a graph encoder, and distilling knowledge using three objective functions: next-token prediction, a contrastive loss between correct and incorrect reasoning, and a graph-based objective to model the interaction structure. Experiments on seven widely used commonsense and math reasoning benchmarks show that MAGDi improves the reasoning capabilities of smaller models, outperforming several methods that distill from a single teacher and multiple teachers. Moreover, MAGDi also demonstrates an order of magnitude higher efficiency over its teachers. We conduct extensive analyses to show that MAGDi (1) enhances the generalizability to out-of-domain tasks, (2) scales positively with the size and strength of the base student model, and (3) obtains larger improvements (via our multi-teacher training) when applying self-consistency – an inference technique that relies on model diversity.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24ah.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24ah/chen24ah.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24ah.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Justin
family: Chen
- given: Swarnadeep
family: Saha
- given: Elias
family: Stengel-Eskin
- given: Mohit
family: Bansal
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7220-7235
id: chen24ah
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7220
lastpage: 7235
published: 2024-07-08 00:00:00 +0000
- title: 'CaRiNG: Learning Temporal Causal Representation under Non-Invertible Generation Process'
abstract: 'Identifying the underlying time-delayed latent causal processes in sequential data is vital for grasping temporal dynamics and making downstream reasoning. While some recent methods can robustly identify these latent causal variables, they rely on strict assumptions about the invertible generation process from latent variables to observed data. However, these assumptions are often hard to satisfy in real-world applications containing information loss. For instance, the visual perception process translates a 3D space into 2D images, or the phenomenon of persistence of vision incorporates historical data into current perceptions. To address this challenge, we establish an identifiability theory that allows for the recovery of independent latent components even when they come from a nonlinear and non-invertible mix. Using this theory as a foundation, we propose a principled approach, CaRiNG, to learn the Causal Representation of Non-invertible Generative temporal data with identifiability guarantees. Specifically, we utilize temporal context to recover lost latent information and apply the conditions in our theory to guide the training process. Through experiments conducted on synthetic datasets, we validate that our CaRiNG method reliably identifies the causal process, even when the generation process is non-invertible. Moreover, we demonstrate that our approach considerably improves temporal understanding and reasoning in practical applications.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24ai.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24ai/chen24ai.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24ai.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Guangyi
family: Chen
- given: Yifan
family: Shen
- given: Zhenhao
family: Chen
- given: Xiangchen
family: Song
- given: Yuewen
family: Sun
- given: Weiran
family: Yao
- given: Xiao
family: Liu
- given: Kun
family: Zhang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7236-7259
id: chen24ai
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7236
lastpage: 7259
published: 2024-07-08 00:00:00 +0000
- title: 'GRATH: Gradual Self-Truthifying for Large Language Models'
abstract: 'Truthfulness is paramount for large language models (LLMs) as they are increasingly deployed in real-world applications. However, existing LLMs still struggle with generating truthful content, as evidenced by their modest performance on benchmarks like TruthfulQA. To address this issue, we propose GRAdual self-truTHifying (GRATH), a novel post-processing method to enhance truthfulness of LLMs. GRATH utilizes out-of-domain question prompts to generate pairwise truthfulness training data with each pair containing a question and its correct and incorrect answers, and then optimizes the model via direct preference optimization (DPO) to learn from the truthfulness difference between answer pairs. GRATH iteratively refines truthfulness data and updates the model, leading to a gradual improvement in model truthfulness in a self-supervised manner. Empirically, we evaluate GRATH using different 7B-LLMs and compare with LLMs with similar or even larger sizes on benchmark datasets. Our results show that GRATH effectively improves LLMs’ truthfulness without compromising other core capabilities. Notably, GRATH achieves state-of-the-art performance on TruthfulQA, with MC1 accuracy of 54.71% and MC2 accuracy of 69.10%, which even surpass those on 70B-LLMs. The code is available at https://github.com/chenweixin107/GRATH.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24aj.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24aj/chen24aj.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24aj.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Weixin
family: Chen
- given: Dawn
family: Song
- given: Bo
family: Li
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7260-7279
id: chen24aj
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7260
lastpage: 7279
published: 2024-07-08 00:00:00 +0000
- title: 'Unleashing the Power of Meta-tuning for Few-shot Generalization Through Sparse Interpolated Experts'
abstract: 'Recent successes suggest that parameter-efficient fine-tuning of foundation models is becoming the state-of-the-art method for transfer learning in vision, gradually replacing the rich literature of alternatives such as meta-learning. In trying to harness the best of both worlds, meta-tuning introduces a subsequent optimization stage of foundation models but has so far only shown limited success and crucially tends to underperform on out-of-distribution (OOD) tasks. In this paper, we introduce Sparse MetA-Tuning (SMAT), a method inspired by sparse mixture-of-experts approaches and trained to isolate subsets of pre-trained parameters automatically for meta-tuning on each task. SMAT successfully overcomes OOD sensitivity and delivers on the promise of enhancing the transfer abilities of vision foundation models beyond parameter-efficient finetuning. We establish new state-of-the-art results on a challenging combination of Meta-Dataset augmented with additional OOD tasks in both zero-shot and gradient-based adaptation settings. In addition, we provide a thorough analysis of the superiority of learned over hand-designed sparsity patterns for sparse expert methods and the pivotal importance of the sparsity level in balancing between in-distribution and out-of-distribution generalization. Our code and models are publicly available.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24ak.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24ak/chen24ak.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24ak.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Shengzhuang
family: Chen
- given: Jihoon
family: Tack
- given: Yunqiao
family: Yang
- given: Yee Whye
family: Teh
- given: Jonathan Richard
family: Schwarz
- given: Ying
family: Wei
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7280-7297
id: chen24ak
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7280
lastpage: 7297
published: 2024-07-08 00:00:00 +0000
- title: 'Performative Prediction with Bandit Feedback: Learning through Reparameterization'
abstract: 'Performative prediction, as introduced by Perdomo et al., is a framework for studying social prediction in which the data distribution itself changes in response to the deployment of a model. Existing work in this field usually hinges on three assumptions that are easily violated in practice: that the performative risk is convex over the deployed model, that the mapping from the model to the data distribution is known to the model designer in advance, and the first-order information of the performative risk is available. In this paper, we initiate the study of performative prediction problems that do not require these assumptions. Specifically, we develop a parameterization framework that parametrizes the performative prediction objective as a function of the induced data distribution. We also develop a two-level zeroth-order optimization procedure, where the first level performs iterative optimization on the distribution parameter space, and the second level learns the model that induced a particular target distribution parameter at each iteration. Under mild conditions, this reparameterization allows us to transform the non-convex objective into a convex one and achieve provable regret guarantees. In particular, we provide a regret bound that is sublinear in the total number of performative samples taken and is only polynomial in the dimension of the model parameter.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24al.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24al/chen24al.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24al.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yatong
family: Chen
- given: Wei
family: Tang
- given: Chien-Ju
family: Ho
- given: Yang
family: Liu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7298-7324
id: chen24al
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7298
lastpage: 7324
published: 2024-07-08 00:00:00 +0000
- title: 'Self-Attention through Kernel-Eigen Pair Sparse Variational Gaussian Processes'
abstract: 'While the great capability of Transformers significantly boosts prediction accuracy, it could also yield overconfident predictions and require calibrated uncertainty estimation, which can be commonly tackled by Gaussian processes (GPs). Existing works apply GPs with symmetric kernels under variational inference to the attention kernel; however, omitting the fact that attention kernels are in essence asymmetric. Moreover, the complexity of deriving the GP posteriors remains high for large-scale data. In this work, we propose Kernel-Eigen Pair Sparse Variational Gaussian Processes (KEP-SVGP) for building uncertainty-aware self-attention where the asymmetry of attention kernels is tackled by Kernel SVD (KSVD) and a reduced complexity is acquired. Through KEP-SVGP, i) the SVGP pair induced by the two sets of singular vectors from KSVD w.r.t. the attention kernel fully characterizes the asymmetry; ii) using only a small set of adjoint eigenfunctions from KSVD, the derivation of SVGP posteriors can be based on the inversion of a diagonal matrix containing singular values, contributing to a reduction in time complexity; iii) an evidence lower bound is derived so that variational parameters and network weights can be optimized with it. Experiments verify our excellent performances and efficiency on in-distribution, distribution-shift and out-of-distribution benchmarks.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24am.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24am/chen24am.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24am.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yingyi
family: Chen
- given: Qinghua
family: Tao
- given: Francesco
family: Tonin
- given: Johan
family: Suykens
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7325-7345
id: chen24am
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7325
lastpage: 7345
published: 2024-07-08 00:00:00 +0000
- title: 'Recovering Labels from Local Updates in Federated Learning'
abstract: 'Gradient inversion (GI) attacks present a threat to the privacy of clients in federated learning (FL) by aiming to enable reconstruction of the clients’ data from communicated model updates. A number of such techniques attempts to accelerate data recovery by first reconstructing labels of the samples used in local training. However, existing label extraction methods make strong assumptions that typically do not hold in realistic FL settings. In this paper we present a novel label recovery scheme, Recovering Labels from Local Updates (RLU), which provides near-perfect accuracy when attacking untrained (most vulnerable) models. More significantly, RLU achieves high performance even in realistic real-world settings where the clients in an FL system run multiple local epochs, train on heterogeneous data, and deploy various optimizers to minimize different objective functions. Specifically, RLU estimates labels by solving a least-square problem that emerges from the analysis of the correlation between labels of the data points used in a training round and the resulting update of the output layer. The experimental results on several datasets, architectures, and data heterogeneity scenarios demonstrate that the proposed method consistently outperforms existing baselines, and helps improve quality of the reconstructed images in GI attacks in terms of both PSNR and LPIPS.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24an.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24an/chen24an.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24an.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Huancheng
family: Chen
- given: Haris
family: Vikalo
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7346-7372
id: chen24an
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7346
lastpage: 7372
published: 2024-07-08 00:00:00 +0000
- title: 'SelfIE: Self-Interpretation of Large Language Model Embeddings'
abstract: 'How do large language models (LLMs) obtain their answers? The ability to explain and control an LLM’s reasoning process is key for reliability, transparency, and future model developments. We propose SelfIE (Self-Interpretation of Embeddings), a framework that enables LLMs to interpret their own embeddings in natural language by leveraging their ability to respond to inquiries about a given passage. Capable of interpreting open-world concepts in the hidden embeddings, SelfIE reveals LLM internal reasoning in cases such as making ethical decisions, internalizing prompt injection, and recalling harmful knowledge. SelfIE’s text descriptions on hidden embeddings open avenues to control LLM reasoning. We propose Supervised Control, which allows editing open-ended concepts while only requiring gradient computation of individual layer. We extend RLHF to hidden embeddings and propose Reinforcement Control that erases harmful knowledge in LLM without supervision targets.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24ao.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24ao/chen24ao.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24ao.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Haozhe
family: Chen
- given: Carl
family: Vondrick
- given: Chengzhi
family: Mao
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7373-7388
id: chen24ao
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7373
lastpage: 7388
published: 2024-07-08 00:00:00 +0000
- title: 'Locally Differentially Private Decentralized Stochastic Bilevel Optimization with Guaranteed Convergence Accuracy'
abstract: 'Decentralized bilevel optimization based machine learning techniques are achieving remarkable success in a wide variety of domains. However, the intensive exchange of information (involving nested-loops of consensus or communication iterations) in existing decentralized bilevel optimization algorithms leads to a great challenge to ensure rigorous differential privacy, which, however, is necessary to bring the benefits of machine learning to domains where involved data are sensitive. By proposing a new decentralized stochastic bilevel-optimization algorithm which avoids nested-loops of information-exchange iterations, we achieve, for the first time, both differential privacy and accurate convergence in decentralized bilevel optimization. This is significant since even for single-level decentralized optimization and learning, existing differential-privacy solutions have to sacrifice convergence accuracy for privacy. Besides characterizing the convergence rate under nonconvex/convex/strongly convex conditions, we also rigorously quantify the price of differential privacy in the convergence rate. Experimental results on machine learning models confirm the efficacy of our algorithm.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24ap.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24ap/chen24ap.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24ap.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ziqin
family: Chen
- given: Yongqiang
family: Wang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7389-7439
id: chen24ap
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7389
lastpage: 7439
published: 2024-07-08 00:00:00 +0000
- title: 'Subequivariant Reinforcement Learning in 3D Multi-Entity Physical Environments'
abstract: 'Learning policies for multi-entity systems in 3D environments is far more complicated against single-entity scenarios, due to the exponential expansion of the global state space as the number of entities increases. One potential solution of alleviating the exponential complexity is dividing the global space into independent local views that are invariant to transformations including translations and rotations. To this end, this paper proposes *Subequivariant Hierarchical Neural Networks* (SHNN) to facilitate multi-entity policy learning. In particular, SHNN first dynamically decouples the global space into local entity-level graphs via task assignment. Second, it leverages subequivariant message passing over the local entity-level graphs to devise local reference frames, remarkably compressing the representation redundancy, particularly in gravity-affected environments. Furthermore, to overcome the limitations of existing benchmarks in capturing the subtleties of multi-entity systems under the Euclidean symmetry, we propose the *Multi-entity Benchmark* (MEBEN), a new suite of environments tailored for exploring a wide range of multi-entity reinforcement learning. Extensive experiments demonstrate significant advancements of SHNN on the proposed benchmarks compared to existing methods. Comprehensive ablations are conducted to verify the indispensability of task assignment and subequivariance.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24aq.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24aq/chen24aq.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24aq.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Runfa
family: Chen
- given: Ling
family: Wang
- given: Yu
family: Du
- given: Tianrui
family: Xue
- given: Fuchun
family: Sun
- given: Jianwei
family: Zhang
- given: Wenbing
family: Huang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7440-7461
id: chen24aq
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7440
lastpage: 7461
published: 2024-07-08 00:00:00 +0000
- title: 'A General Framework for Learning from Weak Supervision'
abstract: 'Weakly supervised learning generally faces challenges in applicability to various scenarios with diverse weak supervision and in scalability due to the complexity of existing algorithms, thereby hindering the practical deployment. This paper introduces a general framework for learning from weak supervision (GLWS) with a novel algorithm. Central to GLWS is an Expectation-Maximization (EM) formulation, adeptly accommodating various weak supervision sources, including instance partial labels, aggregate statistics, pairwise observations, and unlabeled data. We further present an advanced algorithm that significantly simplifies the EM computational demands using a Non-deterministic Finite Automaton (NFA) along with a forward-backward algorithm, which effectively reduces time complexity from quadratic or factorial often required in existing solutions to linear scale. The problem of learning from arbitrary weak supervision is therefore converted to the NFA modeling of them. GLWS not only enhances the scalability of machine learning models but also demonstrates superior performance and versatility across 11 weak supervision scenarios. We hope our work paves the way for further advancements and practical deployment in this field.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24ar.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24ar/chen24ar.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24ar.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hao
family: Chen
- given: Jindong
family: Wang
- given: Lei
family: Feng
- given: Xiang
family: Li
- given: Yidong
family: Wang
- given: Xing
family: Xie
- given: Masashi
family: Sugiyama
- given: Rita
family: Singh
- given: Bhiksha
family: Raj
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7462-7485
id: chen24ar
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7462
lastpage: 7485
published: 2024-07-08 00:00:00 +0000
- title: 'Diffusion Model-Augmented Behavioral Cloning'
abstract: 'Imitation learning addresses the challenge of learning by observing an expert’s demonstrations without access to reward signals from environments. Most existing imitation learning methods that do not require interacting with environments either model the expert distribution as the conditional probability p(a|s) (e.g., behavioral cloning, BC) or the joint probability p(s, a). Despite the simplicity of modeling the conditional probability with BC, it usually struggles with generalization. While modeling the joint probability can improve generalization performance, the inference procedure is often time-consuming, and the model can suffer from manifold overfitting. This work proposes an imitation learning framework that benefits from modeling both the conditional and joint probability of the expert distribution. Our proposed Diffusion Model-Augmented Behavioral Cloning (DBC) employs a diffusion model trained to model expert behaviors and learns a policy to optimize both the BC loss (conditional) and our proposed diffusion model loss (joint). DBC outperforms baselines in various continuous control tasks in navigation, robot arm manipulation, dexterous manipulation, and locomotion. We design additional experiments to verify the limitations of modeling either the conditional probability or the joint probability of the expert distribution, as well as compare different generative models. Ablation studies justify the effectiveness of our design choices.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24as.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24as/chen24as.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24as.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Shang-Fu
family: Chen
- given: Hsiang-Chun
family: Wang
- given: Ming-Hao
family: Hsu
- given: Chun-Mao
family: Lai
- given: Shao-Hua
family: Sun
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7486-7510
id: chen24as
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7486
lastpage: 7510
published: 2024-07-08 00:00:00 +0000
- title: 'AutoOS: Make Your OS More Powerful by Exploiting Large Language Models'
abstract: 'With the rapid development of Artificial Intelligence of Things (AIoT), customizing and optimizing operating system (OS) kernel configurations for various AIoT application scenarios is crucial for maximizing system performance. However, existing approaches falter due to the overwhelming problem complexity (i.e., over 15,000 configuration options in the Linux kernel), together with the huge evaluation costs and error-prone options that may result in OS boot-up failure, which all make it an unresolved problem to optimize the Linux kernel automatically. In this paper, we introduce AutoOS, a novel framework exploiting Large Language Models for customizing and optimizing OS kernel configurations automatically for various AIoT application scenarios.Inspired by the inherently directory-structured kernel configuration process, we first formulate our research problem as optimizing on a dynamic tree. We then propose a novel framework integrating a state machine-based traversal algorithm as the observe-prune-propose-act-correct loop, which can effectively refine the optimization space and ensure a successful OS boot-up.Experimental results show that AutoOS can automatically customize and optimize the OS kernel configurations without human effort. More importantly, AutoOS even achieves better performance by up to 25% than vendor-provided configuration.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24at.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24at/chen24at.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24at.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Huilai
family: Chen
- given: Yuanbo
family: Wen
- given: Limin
family: Cheng
- given: Shouxu
family: Kuang
- given: Yumeng
family: Liu
- given: Weijia
family: Li
- given: Ling
family: Li
- given: Rui
family: Zhang
- given: Xinkai
family: Song
- given: Wei
family: Li
- given: Qi
family: Guo
- given: Yunji
family: Chen
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7511-7525
id: chen24at
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7511
lastpage: 7525
published: 2024-07-08 00:00:00 +0000
- title: 'Positional Knowledge is All You Need: Position-induced Transformer (PiT) for Operator Learning'
abstract: 'Operator learning for Partial Differential Equations (PDEs) is rapidly emerging as a promising approach for surrogate modeling of intricate systems. Transformers with the self-attention mechanism—a powerful tool originally designed for natural language processing—have recently been adapted for operator learning. However, they confront challenges, including high computational demands and limited interpretability. This raises a critical question: *Is there a more efficient attention mechanism for Transformer-based operator learning?* This paper proposes the Position-induced Transformer (PiT), built on an innovative position-attention mechanism, which demonstrates significant advantages over the classical self-attention in operator learning. Position-attention draws inspiration from numerical methods for PDEs. Different from self-attention, position-attention is induced by only the spatial interrelations of sampling positions for input functions of the operators, and does not rely on the input function values themselves, thereby greatly boosting efficiency. PiT exhibits superior performance over current state-of-the-art neural operators in a variety of complex operator learning tasks across diverse PDE benchmarks. Additionally, PiT possesses an enhanced discretization convergence feature, compared to the widely-used Fourier neural operator.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24au.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24au/chen24au.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24au.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Junfeng
family: Chen
- given: Kailiang
family: Wu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7526-7552
id: chen24au
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7526
lastpage: 7552
published: 2024-07-08 00:00:00 +0000
- title: 'In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation'
abstract: 'Large language models (LLMs) frequently hallucinate, e.g., making factual errors, yet our understanding of why they make these errors remains limited. In this study, we aim to understand the underlying mechanisms of LLM hallucinations from the perspective of *inner representations*. We discover a pattern associated with hallucinations: correct generations tend to have *sharper* context activations in the hidden states of the in-context tokens, compared to that of the incorrect generations. Leveraging this signal, we propose an entropy-based metric to quantify the *sharpness* among the in-context hidden states and incorporate it into the decoding process, i.e, use the entropy value to adjust the next token prediction distribution to improve the factuality and overall quality of the generated text. Experiments on knowledge-seeking datasets (Natural Questions, HotpotQA, TriviaQA) and hallucination benchmark (TruthfulQA) demonstrate our consistent effectiveness, e.g., up to 8.6 absolute points on TruthfulQA. We believe this study can improve our understanding of hallucinations and serve as a practical solution for hallucination mitigation.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24av.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24av/chen24av.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24av.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Shiqi
family: Chen
- given: Miao
family: Xiong
- given: Junteng
family: Liu
- given: Zhengxuan
family: Wu
- given: Teng
family: Xiao
- given: Siyang
family: Gao
- given: Junxian
family: He
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7553-7567
id: chen24av
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7553
lastpage: 7567
published: 2024-07-08 00:00:00 +0000
- title: 'Split-Ensemble: Efficient OOD-aware Ensemble via Task and Model Splitting'
abstract: 'Uncertainty estimation is crucial for deep learning models to detect out-of-distribution (OOD) inputs. However, the naive deep learning classifiers produce uncalibrated uncertainty for OOD data. Improving the uncertainty estimation typically requires external data for OOD-aware training or considerable costs to build an ensemble. In this work, we improve on uncertainty estimation without extra OOD data or additional inference costs using an alternative *Split-Ensemble* method. Specifically, we propose a novel *subtask-splitting* ensemble training objective where a task is split into several complementary subtasks based on feature similarity. Each subtask considers part of the data as in distribution while all the rest as OOD data. Diverse submodels can therefore be trained on each subtask with OOD-aware objectives, learning generalizable uncertainty estimation. To avoid overheads, we enable low-level feature sharing among submodels, building a tree-like Split-Ensemble architecture via iterative splitting and pruning. Empirical study shows Split-Ensemble, without additional computational cost, improves accuracy over a single model by 0.8%, 1.8%, and 25.5% on CIFAR-10, CIFAR-100, and Tiny-ImageNet, respectively. OOD detection for the same backbone and in-distribution datasets surpasses a single model baseline by 2.2%, 8.1%, and 29.6% in mean AUROC, respectively.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24aw.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24aw/chen24aw.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24aw.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Anthony
family: Chen
- given: Huanrui
family: Yang
- given: Yulu
family: Gan
- given: Denis A
family: Gudovskiy
- given: Zhen
family: Dong
- given: Haofan
family: Wang
- given: Tomoyuki
family: Okuno
- given: Yohei
family: Nakata
- given: Kurt
family: Keutzer
- given: Shanghang
family: Zhang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7568-7585
id: chen24aw
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7568
lastpage: 7585
published: 2024-07-08 00:00:00 +0000
- title: 'Deep Demonstration Tracing: Learning Generalizable Imitator Policy for Runtime Imitation from a Single Demonstration'
abstract: 'One-shot imitation learning (OSIL) is to learn an imitator agent that can execute multiple tasks with only a single demonstration. In real-world scenario, the environment is dynamic, e.g., unexpected changes can occur after demonstration. Thus, achieving generalization of the imitator agent is crucial as agents would inevitably face situations unseen in the provided demonstrations. While traditional OSIL methods excel in relatively stationary settings, their adaptability to such unforeseen changes, which asking for a higher level of generalization ability for the imitator agents, is limited and rarely discussed. In this work, we present a new algorithm called Deep Demonstration Tracing (DDT). In DDT, we propose a demonstration transformer architecture to encourage agents to adaptively trace suitable states in demonstrations. Besides, it integrates OSIL into a meta-reinforcement-learning training paradigm, providing regularization for policies in unexpected situations. We evaluate DDT on a new navigation task suite and robotics tasks, demonstrating its superior performance over existing OSIL methods across all evaluated tasks in dynamic environments with unforeseen changes. The project page is in https://osil-ddt.github.io.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24ax.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24ax/chen24ax.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24ax.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xiong-Hui
family: Chen
- given: Junyin
family: Ye
- given: Hang
family: Zhao
- given: Yi-Chen
family: Li
- given: Xu-Hui
family: Liu
- given: Haoran
family: Shi
- given: Yu-Yan
family: Xu
- given: Zhihao
family: Ye
- given: Si-Hang
family: Yang
- given: Yang
family: Yu
- given: Anqi
family: Huang
- given: Kai
family: Xu
- given: Zongzhang
family: Zhang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7586-7620
id: chen24ax
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7586
lastpage: 7620
published: 2024-07-08 00:00:00 +0000
- title: 'DRCT: Diffusion Reconstruction Contrastive Training towards Universal Detection of Diffusion Generated Images'
abstract: 'Diffusion models have made significant strides in visual content generation but also raised increasing demands on generated image detection. Existing detection methods have achieved considerable progress, but they usually suffer a significant decline in accuracy when detecting images generated by an unseen diffusion model. In this paper, we seek to address the generalizability of generated image detectors from the perspective of hard sample classification. The basic idea is that if a classifier can distinguish generated images that closely resemble real ones, then it can also effectively detect less similar samples, potentially even those produced by a different diffusion model. Based on this idea, we propose Diffusion Reconstruction Contrastive Learning (DRCT), a universal framework to enhance the generalizability of the existing detectors. DRCT generates hard samples by high-quality diffusion reconstruction and adopts contrastive training to guide the learning of diffusion artifacts. In addition, we have built a million-scale dataset, DRCT-2M, including 16 types diffusion models for the evaluation of generalizability of detection methods. Extensive experimental results show that detectors enhanced with DRCT achieve over a 10% accuracy improvement in cross-set tests. The code, models, and dataset will soon be available at https://github.com/beibuwandeluori/DRCT.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24ay.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24ay/chen24ay.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24ay.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Baoying
family: Chen
- given: Jishen
family: Zeng
- given: Jianquan
family: Yang
- given: Rui
family: Yang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7621-7639
id: chen24ay
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7621
lastpage: 7639
published: 2024-07-08 00:00:00 +0000
- title: '$\rm E(3)$-Equivariant Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning'
abstract: 'Identification and analysis of symmetrical patterns in the natural world have led to significant discoveries across various scientific fields, such as the formulation of gravitational laws in physics and advancements in the study of chemical structures. In this paper, we focus on exploiting Euclidean symmetries inherent in certain cooperative multi-agent reinforcement learning (MARL) problems and prevalent in many applications. We begin by formally characterizing a subclass of Markov games with a general notion of symmetries that admits the existence of symmetric optimal values and policies. Motivated by these properties, we design neural network architectures with symmetric constraints embedded as an inductive bias for multi-agent actor-critic methods. This inductive bias results in superior performance in various cooperative MARL benchmarks and impressive generalization capabilities such as zero-shot learning and transfer learning in unseen scenarios with repeated symmetric patterns.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24az.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24az/chen24az.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24az.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Dingyang
family: Chen
- given: Qi
family: Zhang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7640-7666
id: chen24az
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7640
lastpage: 7666
published: 2024-07-08 00:00:00 +0000
- title: 'FedMBridge: Bridgeable Multimodal Federated Learning'
abstract: 'Multimodal Federated Learning (MFL) addresses the setup of multiple clients with diversified modality types (e.g. image, text, video, and audio) working together to improve their local personal models in a data-privacy manner. Prior MFL works rely on restrictive compositional neural architecture designs to ensure inter-client information sharing via blockwise model aggregation, limiting their applicability in the real-world **Architecture-personalized MFL (AMFL)** scenarios, where clients may have distinguished multimodal interaction strategies and there is no restriction on local architecture design. The key challenge in AMFL is how to automatically and efficiently tackle the two heterogeneity patterns–statistical and architecture heterogeneity–while maximizing the beneficial information sharing among clients. To solve this challenge, we propose **FedMBridge**, which leverages a topology-aware hypernetwork to act as a bridge that can automatically balance and digest the two heterogeneity patterns in a communication-efficient manner. Our experiments on four AMFL simulations demonstrate the efficiency and effectiveness of our proposed approach.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24ba.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24ba/chen24ba.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24ba.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jiayi
family: Chen
- given: Aidong
family: Zhang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7667-7686
id: chen24ba
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7667
lastpage: 7686
published: 2024-07-08 00:00:00 +0000
- title: 'Revealing the Dark Secrets of Extremely Large Kernel ConvNets on Robustness'
abstract: 'Robustness is a vital aspect to consider when deploying deep learning models into the wild. Numerous studies have been dedicated to the study of the robustness of vision transformers (ViTs), which have dominated as the mainstream backbone choice for vision tasks since the dawn of 2020s. Recently, some large kernel convnets make a comeback with impressive performance and efficiency. However, it still remains unclear whether large kernel networks are robust and the attribution of their robustness. In this paper, we first conduct a comprehensive evaluation of large kernel convnets’ robustness and their differences from typical small kernel counterparts and ViTs on six diverse robustness benchmark datasets. Then to analyze the underlying factors behind their strong robustness, we design experiments from both quantitative and qualitative perspectives to reveal large kernel convnets’ intriguing properties that are completely different from typical convnets. Our experiments demonstrate for the first time that pure CNNs can achieve exceptional robustness comparable or even superior to that of ViTs. Our analysis on occlusion invariance, kernel attention patterns and frequency characteristics provide novel insights into the source of robustness. Code available at: https://github.com/Lauch1ng/LKRobust.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24bb.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24bb/chen24bb.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24bb.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Honghao
family: Chen
- given: Yurong
family: Zhang
- given: Xiaokun
family: Feng
- given: Xiangxiang
family: Chu
- given: Kaiqi
family: Huang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7687-7699
id: chen24bb
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7687
lastpage: 7699
published: 2024-07-08 00:00:00 +0000
- title: 'One for All: A Universal Generator for Concept Unlearnability via Multi-Modal Alignment'
abstract: 'The abundance of free internet data offers unprecedented opportunities for researchers and developers, but it also poses privacy risks. Utilizing data without explicit consent raises critical challenges in protecting personal information.Unlearnable examples have emerged as a feasible protection approach, which renders the data unlearnable, i.e., useless to third parties, by injecting imperceptible perturbations. However, these perturbations only exhibit unlearnable effects on either a particular dataset or label-consistent scenarios, thereby lacking broad applicability. To address both issues concurrently, we propose a universal perturbation generator that harnesses data with concept unlearnability, thereby broadening the scope of unlearnability beyond specific datasets or labels. Specifically, we leverage multi-modal pre-trained models to establish a connection between the data concepts in a shared embedding space. This connection enables the information transformation from image data to text concepts. Consequently, we can align the text embedding using concept-wise discriminant loss, and render the data unlearnable. Extensive experiments conducted on real-world datasets demonstrate the concept unlearnability, i.e., cross-dataset transferability and label-agnostic utility, of our proposed unlearnable examples, as well as their robustness against attacks.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24bc.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24bc/chen24bc.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24bc.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Chaochao
family: Chen
- given: Jiaming
family: Zhang
- given: Yuyuan
family: Li
- given: Zhongxuan
family: Han
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7700-7711
id: chen24bc
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7700
lastpage: 7711
published: 2024-07-08 00:00:00 +0000
- title: 'Generating In-Distribution Proxy Graphs for Explaining Graph Neural Networks'
abstract: 'Graph Neural Networks (GNNs) have become a building block in graph data processing, with wide applications in critical domains. The growing needs to deploy GNNs in high-stakes applications necessitate explainability for users in the decision-making processes. A popular paradigm for the explainability of GNNs is to identify explainable subgraphs by comparing their labels with the ones of original graphs. This task is challenging due to the substantial distributional shift from the original graphs in the training set to the set of explainable subgraphs, which prevents accurate prediction of labels with the subgraphs. To address it, in this paper, we propose a novel method that generates proxy graphs for explainable subgraphs that are in the distribution of training data. We introduce a parametric method that employs graph generators to produce proxy graphs. A new training objective based on information theory is designed to ensure that proxy graphs not only adhere to the distribution of training data but also preserve explanatory factors. Such generated proxy graphs can be reliably used to approximate the predictions of the labels of explainable subgraphs. Empirical evaluations across various datasets demonstrate our method achieves more accurate explanations for GNNs.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24bd.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24bd/chen24bd.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24bd.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zhuomin
family: Chen
- given: Jiaxing
family: Zhang
- given: Jingchao
family: Ni
- given: Xiaoting
family: Li
- given: Yuchen
family: Bian
- given: Md Mezbahul
family: Islam
- given: Ananda
family: Mondal
- given: Hua
family: Wei
- given: Dongsheng
family: Luo
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7712-7730
id: chen24bd
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7712
lastpage: 7730
published: 2024-07-08 00:00:00 +0000
- title: 'Diffusive Gibbs Sampling'
abstract: 'The inadequate mixing of conventional Markov Chain Monte Carlo (MCMC) methods for multi-modal distributions presents a significant challenge in practical applications such as Bayesian inference and molecular dynamics. Addressing this, we propose Diffusive Gibbs Sampling (DiGS), an innovative family of sampling methods designed for effective sampling from distributions characterized by distant and disconnected modes. DiGS integrates recent developments in diffusion models, leveraging Gaussian convolution to create an auxiliary noisy distribution that bridges isolated modes in the original space and applying Gibbs sampling to alternately draw samples from both spaces. A novel Metropolis-within-Gibbs scheme is proposed to enhance mixing in the denoising sampling step. DiGS exhibits a better mixing property for sampling multi-modal distributions than state-of-the-art methods such as parallel tempering, attaining substantially improved performance across various tasks, including mixtures of Gaussians, Bayesian neural networks and molecular dynamics.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24be.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24be/chen24be.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24be.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Wenlin
family: Chen
- given: Mingtian
family: Zhang
- given: Brooks
family: Paige
- given: José Miguel
family: Hernández-Lobato
- given: David
family: Barber
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7731-7747
id: chen24be
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7731
lastpage: 7747
published: 2024-07-08 00:00:00 +0000
- title: 'Provable Risk-Sensitive Distributional Reinforcement Learning with General Function Approximation'
abstract: 'In the realm of reinforcement learning (RL), accounting for risk is crucial for making decisions under uncertainty, particularly in applications where safety and reliability are paramount. In this paper, we introduce a general framework on Risk-Sensitive Distributional Reinforcement Learning (RS-DisRL), with static Lipschitz Risk Measures (LRM) and general function approximation. Our framework covers a broad class of risk-sensitive RL, and facilitates analysis of the impact of estimation functions on the effectiveness of RSRL strategies and evaluation of their sample complexity. We design two innovative meta-algorithms: RS-DisRL-M, a model-based strategy for model-based function approximation, and RS-DisRL-V, a model-free approach for general value function approximation. With our novel estimation techniques via Least Squares Regression (LSR) and Maximum Likelihood Estimation (MLE) in distributional RL with augmented Markov Decision Process (MDP), we derive the first $\widetilde{\mathcal{O}}(\sqrt{K})$ dependency of the regret upper bound for RSRL with static LRM, marking a pioneering contribution towards statistically efficient algorithms in this domain.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24bf.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24bf/chen24bf.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24bf.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yu
family: Chen
- given: Xiangcheng
family: Zhang
- given: Siwei
family: Wang
- given: Longbo
family: Huang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7748-7791
id: chen24bf
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7748
lastpage: 7791
published: 2024-07-08 00:00:00 +0000
- title: '$\textttMoE-RBench$: Towards Building Reliable Language Models with Sparse Mixture-of-Experts'
abstract: 'Mixture-of-Experts (MoE) has gained increasing popularity as a promising framework for scaling up large language models (LLMs). However, the reliability assessment of MoE lags behind its surging applications. Moreover, when transferred to new domains such as in fine-tuning MoE models sometimes underperform their dense counterparts. Motivated by the research gap and counter-intuitive phenomenon, we propose $\texttt{MoE-RBench}$, the first comprehensive assessment of SMoE reliability from three aspects: $\textit{(i)}$ safety and hallucination, $\textit{(ii)}$ resilience to adversarial attacks, and $\textit{(iii)}$ out-of-distribution robustness. Extensive models and datasets are tested to compare the MoE to dense networks from these reliability dimensions. Our empirical observations suggest that with appropriate hyperparameters, training recipes, and inference techniques, we can build the MoE model more reliably than the dense LLM. In particular, we find that the robustness of SMoE is sensitive to the basic training settings. We hope that this study can provide deeper insights into how to adapt the pre-trained MoE model to other tasks with higher-generation security, quality, and stability. Codes are available at https://github.com/UNITES-Lab/MoE-RBench.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24bg.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24bg/chen24bg.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24bg.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Guanjie
family: Chen
- given: Xinyu
family: Zhao
- given: Tianlong
family: Chen
- given: Yu
family: Cheng
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7792-7808
id: chen24bg
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7792
lastpage: 7808
published: 2024-07-08 00:00:00 +0000
- title: 'LLaGA: Large Language and Graph Assistant'
abstract: 'Graph Neural Networks (GNNs) have empowered the advance in graph-structured data analysis. Recently, the rise of Large Language Models (LLMs) like GPT-4 has heralded a new era in deep learning. However, their application to graph data poses distinct challenges due to the inherent difficulty of translating graph structures to language. To this end, we introduce the the **L**arge **L**anguage **a**nd **G**raph **A**ssistant (**LLaGA**), an innovative model that effectively integrates LLM capabilities to handle the complexities of graph-structured data. LLaGA retains the general-purpose nature of LLMs while adapting graph data into a format compatible with LLM input. LLaGA achieves this by reorganizing graph nodes to structure-aware sequences and then mapping these into the token embedding space through a versatile projector. LLaGA excels in versatility, generalizability and interpretability, allowing it to perform consistently well across different datasets and tasks, extend its ability to unseen datasets or tasks, and provide explanations for graphs. Our extensive experiments across popular graph benchmarks show that LLaGA delivers outstanding performance across four datasets and three tasks using one single model, surpassing state-of-the-art graph models in both supervised and zero-shot scenarios.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24bh.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24bh/chen24bh.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24bh.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Runjin
family: Chen
- given: Tong
family: Zhao
- given: Ajay Kumar
family: Jaiswal
- given: Neil
family: Shah
- given: Zhangyang
family: Wang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7809-7823
id: chen24bh
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7809
lastpage: 7823
published: 2024-07-08 00:00:00 +0000
- title: 'HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding'
abstract: 'While large vision-language models (LVLMs) have demonstrated impressive capabilities in interpreting multi-modal contexts, they invariably suffer from object hallucinations (OH). We introduce HALC, a novel decoding algorithm designed to mitigate OH in LVLMs. HALC leverages distinct fine-grained optimal visual information in vision-language tasks and operates on both local and global contexts simultaneously. Specifically, HALC integrates a robust auto-focal grounding mechanism (locally) to correct hallucinated tokens on the fly, and a specialized beam search algorithm (globally) to significantly reduce OH while preserving text generation quality. Additionally, HALC can be integrated into any LVLMs as a plug-and-play module without extra training. Extensive experimental studies demonstrate HALC’s effectiveness in reducing OH, outperforming state-of-the-arts across four benchmarks. Code is released at https://github.com/BillChan226/HALC.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24bi.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24bi/chen24bi.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24bi.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zhaorun
family: Chen
- given: Zhuokai
family: Zhao
- given: Hongyin
family: Luo
- given: Huaxiu
family: Yao
- given: Bo
family: Li
- given: Jiawei
family: Zhou
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7824-7846
id: chen24bi
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7824
lastpage: 7846
published: 2024-07-08 00:00:00 +0000
- title: 'Compact Optimality Verification for Optimization Proxies'
abstract: 'Recent years have witnessed increasing interest in optimization proxies, i.e., machine learning models that approximate the input-output mapping of parametric optimization problems and return near-optimal feasible solutions. Following recent work by (Nellikkath & Chatzivasileiadis, 2021), this paper reconsiders the optimality verification problem for optimization proxies, i.e., the determination of the worst-case optimality gap over the instance distribution. The paper proposes a compact formulation for optimality verification and a gradient-based primal heuristic that brings significant computational benefits to the original formulation. The compact formulation is also more general and applies to non-convex optimization problems. The benefits of the compact formulation are demonstrated on large-scale DC Optimal Power Flow and knapsack problems.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24bj.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24bj/chen24bj.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24bj.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Wenbo
family: Chen
- given: Haoruo
family: Zhao
- given: Mathieu
family: Tanneau
- given: Pascal
family: Van Hentenryck
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7847-7863
id: chen24bj
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7847
lastpage: 7863
published: 2024-07-08 00:00:00 +0000
- title: 'Enhancing Implicit Shape Generators Using Topological Regularizations'
abstract: 'A fundamental problem in learning 3D shapes generative models is that when the generative model is simply fitted to the training data, the resulting synthetic 3D models can present various artifacts. Many of these artifacts are topological in nature, e.g., broken legs, unrealistic thin structures, and small holes. In this paper, we introduce a principled approach that utilizes topological regularization losses on an implicit shape generator to rectify topological artifacts. The objectives are two-fold. The first is to align the persistent diagram (PD) distribution of the training shapes with that of synthetic shapes. The second ensures that the PDs are smooth among adjacent synthetic shapes. We show how to achieve these two objectives using two simple but effective formulations. Specifically, distribution alignment is achieved to learn a generative model of PDs and align this generator with PDs of synthetic shapes. We show how to handle discrete and continuous variabilities of PDs by using a shape-regularization term when performing PD alignment. Moreover, we enforce the smoothness of the PDs using a smoothness loss on the PD generator, which further improves the behavior of PD distribution alignment. Experimental results on ShapeNet show that our approach leads to much better generalization behavior than state-of-the-art implicit shape generators.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24bk.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24bk/chen24bk.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24bk.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Liyan
family: Chen
- given: Yan
family: Zheng
- given: Yang
family: Li
- given: Lohit Anirudh
family: Jagarapu
- given: Haoxiang
family: Li
- given: Hao
family: Kang
- given: Gang
family: Hua
- given: Qixing
family: Huang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7864-7879
id: chen24bk
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7864
lastpage: 7879
published: 2024-07-08 00:00:00 +0000
- title: 'Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations'
abstract: 'Large language models (LLMs) are trained to imitate humans to explain human decisions. However, do LLMs explain themselves? Can they help humans build mental models of how LLMs process different inputs? To answer these questions, we propose to evaluate $\textbf{counterfactual simulatability}$ of natural language explanations: whether an explanation can enable humans to precisely infer the model’s outputs on diverse counterfactuals of the explained input. For example, if a model answers ”$\textit{yes}$” to the input question ”$\textit{Can eagles fly?}$” with the explanation ”$\textit{all birds can fly}$”, then humans would infer from the explanation that it would also answer ”$\textit{yes}$” to the counterfactual input ”$\textit{Can penguins fly?}$”. If the explanation is precise, then the model’s answer should match humans’ expectations. We implemented two metrics based on counterfactual simulatability: precision and generality. We generated diverse counterfactuals automatically using LLMs. We then used these metrics to evaluate state-of-the-art LLMs (e.g., GPT-4) on two tasks: multi-hop factual reasoning and reward modeling. We found that LLM’s explanations have low precision and that precision does not correlate with plausibility. Therefore, naively optimizing human approvals (e.g., RLHF) may be insufficient.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24bl.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24bl/chen24bl.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24bl.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yanda
family: Chen
- given: Ruiqi
family: Zhong
- given: Narutatsu
family: Ri
- given: Chen
family: Zhao
- given: He
family: He
- given: Jacob
family: Steinhardt
- given: Zhou
family: Yu
- given: Kathleen
family: Mckeown
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7880-7904
id: chen24bl
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7880
lastpage: 7904
published: 2024-07-08 00:00:00 +0000
- title: 'On the Trajectory Regularity of ODE-based Diffusion Sampling'
abstract: 'Diffusion-based generative models use stochastic differential equations (SDEs) and their equivalent ordinary differential equations (ODEs) to establish a smooth connection between a complex data distribution and a tractable prior distribution. In this paper, we identify several intriguing trajectory properties in the ODE-based sampling process of diffusion models. We characterize an implicit denoising trajectory and discuss its vital role in forming the coupled sampling trajectory with a strong shape regularity, regardless of the generated content. We also describe a dynamic programming-based scheme to make the time schedule in sampling better fit the underlying trajectory structure. This simple strategy requires minimal modification to any given ODE-based numerical solvers and incurs negligible computational cost, while delivering superior performance in image generation, especially in $5\sim 10$ function evaluations.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24bm.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24bm/chen24bm.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24bm.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Defang
family: Chen
- given: Zhenyu
family: Zhou
- given: Can
family: Wang
- given: Chunhua
family: Shen
- given: Siwei
family: Lyu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7905-7934
id: chen24bm
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7905
lastpage: 7934
published: 2024-07-08 00:00:00 +0000
- title: 'ODIN: Disentangled Reward Mitigates Hacking in RLHF'
abstract: 'In this work, we study the issue of reward hacking on the response length, a challenge emerging in Reinforcement Learning from Human Feedback (RLHF) on LLMs. A well-formatted, verbose but less helpful response from the LLMs can often deceive LLMs or even human evaluators and achieve high scores. The same issue also holds for some reward models in RL. To address the challenges in both training and evaluation, we establish a more reliable evaluation protocol for comparing different training configurations, which inspects the trade-off between LLM evaluation score and response length obtained by varying training hyperparameters. Based on this evaluation, we conduct large-scale studies, where the results shed insights into the efficacy of hyperparameters and tricks used in RL on mitigating length bias. We further propose to improve the reward model by jointly training two linear heads to predict the preference, one trained to correlate with length and the other trained to decorrelate with length and therefore focusing more on the actual content. We then discard the length head in RL to ignore the spurious length reward. Experiments demonstrate that our approach eliminates the reward correlation with length, and improves the obtained policy by a significant margin.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24bn.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24bn/chen24bn.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24bn.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lichang
family: Chen
- given: Chen
family: Zhu
- given: Jiuhai
family: Chen
- given: Davit
family: Soselia
- given: Tianyi
family: Zhou
- given: Tom
family: Goldstein
- given: Heng
family: Huang
- given: Mohammad
family: Shoeybi
- given: Bryan
family: Catanzaro
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7935-7952
id: chen24bn
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7935
lastpage: 7952
published: 2024-07-08 00:00:00 +0000
- title: 'Stacking Deep Set Networks and Pooling by Quantiles'
abstract: 'We propose Stacked Deep Sets and Quantile Pooling for learning tasks on set data. We introduce Quantile Pooling, a novel permutation-invariant pooling operation that synergizes max and average pooling. Just like max pooling, quantile pooling emphasizes the most salient features of the data. Like average pooling, it captures the overall distribution and subtle features of the data. Like both, it is lightweight and fast. We demonstrate the effectiveness of our approach in a variety of tasks, showing that quantile pooling can outperform both max and average pooling in each of their respective strengths. We also introduce a variant of deep set networks that is more expressive and universal. While Quantile Pooling balances robustness and sensitivity, Stacked Deep Sets enhances learning with depth.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24bo.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24bo/chen24bo.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24bo.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zhuojun
family: Chen
- given: Xinghua
family: Zhu
- given: Dongzhe
family: Su
- given: Justin C. I.
family: Chuang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7953-7971
id: chen24bo
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7953
lastpage: 7971
published: 2024-07-08 00:00:00 +0000
- title: 'What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks'
abstract: 'We study the capabilities of the transformer architecture with varying depth. Specifically, we designed a novel set of sequence learning tasks to systematically evaluate and comprehend how the depth of transformer affects its ability to perform memorization, reasoning, generalization, and contextual generalization. We show a transformer with only one attention layer can excel in memorization but falls short in other tasks. Then, we show that exhibiting reasoning and generalization ability requires the transformer to have at least two attention layers, while context generalization ability may necessitate three attention layers. Additionally, we identify a class of simple operations that a single attention layer can execute, and show that the complex tasks can be approached as the combinations of these simple operations and thus can be resolved by stacking multiple attention layers. This sheds light on studying more practical and complex tasks beyond our design. Numerical experiments corroborate our theoretical findings.'
volume: 235
URL: https://proceedings.mlr.press/v235/chen24bp.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chen24bp/chen24bp.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chen24bp.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xingwu
family: Chen
- given: Difan
family: Zou
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 7972-8001
id: chen24bp
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 7972
lastpage: 8001
published: 2024-07-08 00:00:00 +0000
- title: 'Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context'
abstract: 'Many neural network architectures are known to be Turing Complete, and can thus, in principle implement arbitrary algorithms. However, Transformers are unique in that they can implement gradient-based learning algorithms *under simple parameter configurations*. This paper provides theoretical and empirical evidence that (non-linear) Transformers naturally learn to implement gradient descent *in function space*, which in turn enable them to learn non-linear functions in context. Our results apply to a broad class of combinations of non-linear architectures and non-linear in-context learning tasks. Additionally, we show that the optimal choice of non-linear activation depends in a natural way on the class of functions that need to be learned.'
volume: 235
URL: https://proceedings.mlr.press/v235/cheng24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cheng24a/cheng24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cheng24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xiang
family: Cheng
- given: Yuxin
family: Chen
- given: Suvrit
family: Sra
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8002-8037
id: cheng24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8002
lastpage: 8037
published: 2024-07-08 00:00:00 +0000
- title: 'Layerwise Change of Knowledge in Neural Networks'
abstract: 'This paper aims to explain how a deep neural network (DNN) gradually extracts new knowledge and forgets noisy features through layers in forward propagation. Up to now, although how to define knowledge encoded by the DNN has not reached a consensus so far, previous studies have derived a series of mathematical evidences to take interactions as symbolic primitive inference patterns encoded by a DNN. We extend the definition of interactions and, for the first time, extract interactions encoded by intermediate layers. We quantify and track the newly emerged interactions and the forgotten interactions in each layer during the forward propagation, which shed new light on the learning behavior of DNNs. The layer-wise change of interactions also reveals the change of the generalization capacity and instability of feature representations of a DNN.'
volume: 235
URL: https://proceedings.mlr.press/v235/cheng24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cheng24b/cheng24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cheng24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xu
family: Cheng
- given: Lei
family: Cheng
- given: Zhaoran
family: Peng
- given: Yang
family: Xu
- given: Tian
family: Han
- given: Quanshi
family: Zhang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8038-8059
id: cheng24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8038
lastpage: 8059
published: 2024-07-08 00:00:00 +0000
- title: 'Reference Neural Operators: Learning the Smooth Dependence of Solutions of PDEs on Geometric Deformations'
abstract: 'For partial differential equations on domains of arbitrary shapes, existing works of neural operators attempt to learn a mapping from geometries to solutions. It often requires a large dataset of geometry-solution pairs in order to obtain a sufficiently accurate neural operator. However, for many industrial applications, e.g., engineering design optimization, it can be prohibitive to satisfy the requirement since even a single simulation may take hours or days of computation. To address this issue, we propose *reference neural operators* (RNO), a novel way of implementing neural operators, i.e., to learn the smooth dependence of solutions on geometric deformations. Specifically, given a reference solution, RNO can predict solutions corresponding to arbitrary deformations of the referred geometry. This approach turns out to be much more data efficient. Through extensive experiments, we show that RNO can learn the dependence across various types and different numbers of geometry objects with relatively small datasets. RNO outperforms baseline models in accuracy by a large lead and achieves up to 80% error reduction.'
volume: 235
URL: https://proceedings.mlr.press/v235/cheng24c.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cheng24c/cheng24c.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cheng24c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ze
family: Cheng
- given: Zhongkai
family: Hao
- given: Xiaoqiang
family: Wang
- given: Jianing
family: Huang
- given: Youjia
family: Wu
- given: Xudan
family: Liu
- given: Yiru
family: Zhao
- given: Songming
family: Liu
- given: Hang
family: Su
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8060-8076
id: cheng24c
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8060
lastpage: 8076
published: 2024-07-08 00:00:00 +0000
- title: 'Causal Inference out of Control: Estimating Performativity without Treatment Randomization'
abstract: 'Regulators and academics are increasingly interested in the causal effect that algorithmic actions of a digital platform have on user consumption. In pursuit of estimating this effect from observational data, we identify a set of assumptions that permit causal identifiability without assuming randomized platform actions. Our results are applicable to platforms that rely on machine-learning-powered predictions and leverage knowledge from historical data. The key novelty of our approach is to explicitly model the dynamics of consumption over time, exploiting the repeated interaction of digital platforms with their participants to prove our identifiability results. By viewing the platform as a controller acting on a dynamical system, we can show that exogenous variation in consumption and appropriately responsive algorithmic control actions are sufficient for identifying the causal effect of interest. We complement our claims with an analysis of ready-to-use finite sample estimators and empirical investigations. More broadly, our results deriving identifiability conditions tailored to digital platform settings illustrate a fruitful interplay of control theory and causal inference.'
volume: 235
URL: https://proceedings.mlr.press/v235/cheng24d.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cheng24d/cheng24d.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cheng24d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Gary
family: Cheng
- given: Moritz
family: Hardt
- given: Celestine
family: Mendler-Dünner
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8077-8103
id: cheng24d
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8077
lastpage: 8103
published: 2024-07-08 00:00:00 +0000
- title: 'BadPart: Unified Black-box Adversarial Patch Attacks against Pixel-wise Regression Tasks'
abstract: 'Pixel-wise regression tasks (e.g., monocular depth estimation (MDE) and optical flow estimation (OFE)) have been widely involved in our daily life in applications like autonomous driving, augmented reality and video composition. Although certain applications are security-critical or bear societal significance, the adversarial robustness of such models are not sufficiently studied, especially in the black-box scenario. In this work, we introduce the first unified black-box adversarial patch attack framework against pixel-wise regression tasks, aiming to identify the vulnerabilities of these models under query-based black-box attacks. We propose a novel square-based adversarial patch optimization framework and employ probabilistic square sampling and score-based gradient estimation techniques to generate the patch effectively and efficiently, overcoming the scalability problem of previous black-box patch attacks. Our attack prototype, named BadPart, is evaluated on both MDE and OFE tasks, utilizing a total of 7 models. BadPart surpasses 3 baseline methods in terms of both attack performance and efficiency. We also apply BadPart on the Google online service for portrait depth estimation, causing 43.5% relative distance error with 50K queries. State-of-the-art (SOTA) countermeasures cannot defend our attack effectively.'
volume: 235
URL: https://proceedings.mlr.press/v235/cheng24e.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cheng24e/cheng24e.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cheng24e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zhiyuan
family: Cheng
- given: Zhaoyi
family: Liu
- given: Tengda
family: Guo
- given: Shiwei
family: Feng
- given: Dongfang
family: Liu
- given: Mingjie
family: Tang
- given: Xiangyu
family: Zhang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8104-8122
id: cheng24e
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8104
lastpage: 8122
published: 2024-07-08 00:00:00 +0000
- title: 'GaussianPro: 3D Gaussian Splatting with Progressive Propagation'
abstract: '3D Gaussian Splatting (3DGS) has recently revolutionized the field of neural rendering with its high fidelity and efficiency. However, 3DGS heavily depends on the initialized point cloud produced by Structure-from-Motion (SfM) techniques. When tackling large-scale scenes that unavoidably contain texture-less surfaces, SfM techniques fail to produce enough points in these surfaces and cannot provide good initialization for 3DGS. As a result, 3DGS suffers from difficult optimization and low-quality renderings. In this paper, inspired by classic multi-view stereo (MVS) techniques, we propose GaussianPro, a novel method that applies a progressive propagation strategy to guide the densification of the 3D Gaussians. Compared to the simple split and clone strategies used in 3DGS, our method leverages the priors of the existing reconstructed geometries of the scene and utilizes patch matching to produce new Gaussians with accurate positions and orientations. Experiments on both large-scale and small-scale scenes validate the effectiveness of our method. Our method significantly surpasses 3DGS on the Waymo dataset, exhibiting an improvement of 1.15dB in terms of PSNR. Codes and data are available at https://github.com/kcheng1021/GaussianPro.'
volume: 235
URL: https://proceedings.mlr.press/v235/cheng24f.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cheng24f/cheng24f.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cheng24f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Kai
family: Cheng
- given: Xiaoxiao
family: Long
- given: Kaizhi
family: Yang
- given: Yao
family: Yao
- given: Wei
family: Yin
- given: Yuexin
family: Ma
- given: Wenping
family: Wang
- given: Xuejin
family: Chen
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8123-8140
id: cheng24f
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8123
lastpage: 8140
published: 2024-07-08 00:00:00 +0000
- title: 'Characterizing Overfitting in Kernel Ridgeless Regression Through the Eigenspectrum'
abstract: 'We derive new bounds for the condition number of kernel matrices, which we then use to enhance existing non-asymptotic test error bounds for kernel ridgeless regression in the over-parameterized regime for a fixed input dimension. For kernels with polynomial spectral decay, we recover the bound from previous work; for exponential decay, our bound is non-trivial and novel. Our conclusion is two-fold: (i) kernel regressors whose eigenspectrum decays polynomially must generalize well, even in the presence of noisy labeled training data; these models exhibit so-called tempered overfitting; (ii) if the eigenspectrum of any kernel ridge regressor decays exponentially, then it generalizes poorly, i.e., it exhibits catastrophic overfitting. This adds to the available characterization of kernel ridge regressors exhibiting benign overfitting as the extremal case where the eigenspectrum of the kernel decays sub-polynomially. Our analysis combines new random matrix theory (RMT) techniques with recent tools in the kernel ridge regression (KRR) literature.'
volume: 235
URL: https://proceedings.mlr.press/v235/cheng24g.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cheng24g/cheng24g.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cheng24g.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tin Sum
family: Cheng
- given: Aurelien
family: Lucchi
- given: Anastasis
family: Kratsios
- given: David
family: Belius
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8141-8162
id: cheng24g
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8141
lastpage: 8162
published: 2024-07-08 00:00:00 +0000
- title: 'Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior'
abstract: 'This paper studies the challenging black-box adversarial attack that aims to generate adversarial examples against a black-box model by only using output feedback of the model to input queries. Some previous methods improve the query efficiency by incorporating the gradient of a surrogate white-box model into query-based attacks due to the adversarial transferability. However, the localized gradient is not informative enough, making these methods still query-intensive. In this paper, we propose a Prior-guided Bayesian Optimization (P-BO) algorithm that leverages the surrogate model as a global function prior in black-box adversarial attacks. As the surrogate model contains rich prior information of the black-box one, P-BO models the attack objective with a Gaussian process whose mean function is initialized as the surrogate model’s loss. Our theoretical analysis on the regret bound indicates that the performance of P-BO may be affected by a bad prior. Therefore, we further propose an adaptive integration strategy to automatically adjust a coefficient on the function prior by minimizing the regret bound. Extensive experiments on image classifiers and large vision-language models demonstrate the superiority of the proposed algorithm in reducing queries and improving attack success rates compared with the state-of-the-art black-box attacks. Code is available at https://github.com/yibo-miao/PBO-Attack.'
volume: 235
URL: https://proceedings.mlr.press/v235/cheng24h.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cheng24h/cheng24h.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cheng24h.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Shuyu
family: Cheng
- given: Yibo
family: Miao
- given: Yinpeng
family: Dong
- given: Xiao
family: Yang
- given: Xiao-Shan
family: Gao
- given: Jun
family: Zhu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8163-8183
id: cheng24h
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8163
lastpage: 8183
published: 2024-07-08 00:00:00 +0000
- title: 'Can AI Assistants Know What They Don’t Know?'
abstract: 'AI assistants powered by Large Language Models (LLMs) have demonstrated impressive performance in various tasks. However, LLMs still make factual errors in knowledge-intensive tasks such as open-domain question answering. These untruthful responses from AI assistants can pose significant risks in practical applications. Therefore, in this paper, we ask the question **Can AI assistants know what they don’t know and express this awareness through natural language?** To investigate this, we construct a model-specific "I don’t know" (Idk) dataset. This dataset includes Supervised Fine-tuning data and preference data, categorizing questions based on whether the assistant knows or does not know the answers. Then, we align the assistant with its corresponding Idk dataset using different alignment methods, including Supervised Fine-tuning and preference optimization. Experimental results show that, after alignment with the Idk dataset, the assistant is more capable of declining to answer questions outside its knowledge scope. The assistant aligned with the Idk dataset shows significantly higher truthfulness than the original assistant.'
volume: 235
URL: https://proceedings.mlr.press/v235/cheng24i.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cheng24i/cheng24i.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cheng24i.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Qinyuan
family: Cheng
- given: Tianxiang
family: Sun
- given: Xiangyang
family: Liu
- given: Wenwei
family: Zhang
- given: Zhangyue
family: Yin
- given: Shimin
family: Li
- given: Linyang
family: Li
- given: Zhengfu
family: He
- given: Kai
family: Chen
- given: Xipeng
family: Qiu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8184-8202
id: cheng24i
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8184
lastpage: 8202
published: 2024-07-08 00:00:00 +0000
- title: 'RICE: Breaking Through the Training Bottlenecks of Reinforcement Learning with Explanation'
abstract: 'Deep reinforcement learning (DRL) is playing an increasingly important role in real-world applications. However, obtaining an optimally performing DRL agent for complex tasks, especially with sparse rewards, remains a significant challenge. The training of a DRL agent can be often trapped in a bottleneck without further progress. In this paper, we propose RICE, an innovative refining scheme for reinforcement learning that incorporates explanation methods to break through the training bottlenecks. The high-level idea of RICE is to construct a new initial state distribution that combines both the default initial states and critical states identified through explanation methods, thereby encouraging the agent to explore from the mixed initial states. Through careful design, we can theoretically guarantee that our refining scheme has a tighter sub-optimality bound. We evaluate RICE in various popular RL environments and real-world applications. The results demonstrate that RICE significantly outperforms existing refining schemes in enhancing agent performance.'
volume: 235
URL: https://proceedings.mlr.press/v235/cheng24j.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cheng24j/cheng24j.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cheng24j.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zelei
family: Cheng
- given: Xian
family: Wu
- given: Jiahao
family: Yu
- given: Sabrina
family: Yang
- given: Gang
family: Wang
- given: Xinyu
family: Xing
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8203-8228
id: cheng24j
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8203
lastpage: 8228
published: 2024-07-08 00:00:00 +0000
- title: 'RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences'
abstract: 'Preference-based Reinforcement Learning (PbRL) circumvents the need for reward engineering by harnessing human preferences as the reward signal. However, current PbRL methods excessively depend on high-quality feedback from domain experts, which results in a lack of robustness. In this paper, we present RIME, a robust PbRL algorithm for effective reward learning from noisy preferences. Our method utilizes a sample selection-based discriminator to dynamically filter out noise and ensure robust training. To counteract the cumulative error stemming from incorrect selection, we suggest a warm start for the reward model, which additionally bridges the performance gap during the transition from pre-training to online training in PbRL. Our experiments on robotic manipulation and locomotion tasks demonstrate that RIME significantly enhances the robustness of the state-of-the-art PbRL method. Code is available at https://github.com/CJReinforce/RIME_ICML2024.'
volume: 235
URL: https://proceedings.mlr.press/v235/cheng24k.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cheng24k/cheng24k.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cheng24k.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jie
family: Cheng
- given: Gang
family: Xiong
- given: Xingyuan
family: Dai
- given: Qinghai
family: Miao
- given: Yisheng
family: Lv
- given: Fei-Yue
family: Wang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8229-8247
id: cheng24k
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8229
lastpage: 8247
published: 2024-07-08 00:00:00 +0000
- title: 'Kernel Semi-Implicit Variational Inference'
abstract: 'Semi-implicit variational inference (SIVI) extends traditional variational families with semi-implicit distributions defined in a hierarchical manner. Due to the intractable densities of semi-implicit distributions, classical SIVI often resorts to surrogates of evidence lower bound (ELBO) that would introduce biases for training. A recent advancement in SIVI, named SIVI-SM, utilizes an alternative score matching objective made tractable via a minimax formulation, albeit requiring an additional lower-level optimization. In this paper, we propose kernel SIVI (KSIVI), a variant of SIVI-SM that eliminates the need for the lower-level optimization through kernel tricks. Specifically, we show that when optimizing over a reproducing kernel Hilbert space (RKHS), the lower-level problem has an explicit solution. This way, the upper-level objective becomes the kernel Stein discrepancy (KSD), which is readily computable for stochastic gradient descent due to the hierarchical structure of semi-implicit variational distributions. An upper bound for the variance of the Monte Carlo gradient estimators of the KSD objective is derived, which allows us to establish novel convergence guarantees of KSIVI. We demonstrate the effectiveness and efficiency of KSIVI on both synthetic distributions and a variety of real data Bayesian inference tasks.'
volume: 235
URL: https://proceedings.mlr.press/v235/cheng24l.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cheng24l/cheng24l.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cheng24l.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ziheng
family: Cheng
- given: Longlin
family: Yu
- given: Tianyu
family: Xie
- given: Shiyue
family: Zhang
- given: Cheng
family: Zhang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8248-8269
id: cheng24l
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8248
lastpage: 8269
published: 2024-07-08 00:00:00 +0000
- title: 'Creative Text-to-Audio Generation via Synthesizer Programming'
abstract: 'Neural audio synthesis methods now allow specifying ideas in natural language. However, these methods produce results that cannot be easily tweaked, as they are based on large latent spaces and up to billions of uninterpretable parameters. We propose a text-to-audio generation method that leverages a virtual modular sound synthesizer with only 78 parameters. Synthesizers have long been used by skilled sound designers for media like music and film due to their flexibility and intuitive controls. Our method, CTAG, iteratively updates a synthesizer’s parameters to produce high-quality audio renderings of text prompts that can be easily inspected and tweaked. Sounds produced this way are also more abstract, capturing essential conceptual features over fine-grained acoustic details, akin to how simple sketches can vividly convey visual concepts. Our results show how CTAG produces sounds that are distinctive, perceived as artistic, and yet similarly identifiable to recent neural audio synthesis models, positioning it as a valuable and complementary tool.'
volume: 235
URL: https://proceedings.mlr.press/v235/cherep24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cherep24a/cherep24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cherep24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Manuel
family: Cherep
- given: Nikhil
family: Singh
- given: Jessica
family: Shand
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8270-8285
id: cherep24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8270
lastpage: 8285
published: 2024-07-08 00:00:00 +0000
- title: 'Leveraging (Biased) Information: Multi-armed Bandits with Offline Data'
abstract: 'We leverage offline data to facilitate online learning in stochastic multi-armed bandits. The probability distributions that govern the offline data and the online rewards can be different. Without any non-trival upper bound on their difference, we show that no non-anticipatory policy can out-perform the UCB policy by (Auer et al. 2002), even in the presence of offline data. In complement, we propose an online policy MIN-UCB, which outperforms UCB when a non-trivial upper bound is given. MIN-UCB adaptively chooses to utilize the offline data when they are deemed informative, and to ignore them otherwise. MIN-UCB is shown to be tight in terms of both instance indepedent and dependent regret bounds. Finally, we corroborate the theoretical results with numerical experiments.'
volume: 235
URL: https://proceedings.mlr.press/v235/cheung24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cheung24a/cheung24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cheung24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Wang Chi
family: Cheung
- given: Lixing
family: Lyu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8286-8309
id: cheung24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8286
lastpage: 8309
published: 2024-07-08 00:00:00 +0000
- title: 'Language Models as Science Tutors'
abstract: 'NLP has recently made exciting progress toward training language models (LMs) with strong scientific problem-solving skills. However, model development has not focused on real-life use-cases of LMs for science, including applications in education that require processing long scientific documents. To address this, we introduce TutorEval and TutorChat. TutorEval is a diverse question-answering benchmark consisting of questions about long chapters from STEM textbooks, written by experts. TutorEval helps measure real-life usability of LMs as scientific assistants, and it is the first benchmark combining long contexts, free-form generation, and multi-disciplinary scientific knowledge. Moreover, we show that fine-tuning base models with existing dialogue datasets leads to poor performance on TutorEval. Therefore, we create TutorChat, a dataset of 80,000 long synthetic dialogues about textbooks. We use TutorChat to fine-tune Llemma models with 7B and 34B parameters. These LM tutors specialized in math have a 32K-token context window, and they excel at TutorEval while performing strongly on GSM8K and MATH. Our datasets build on open-source materials, and we release our models, data, and evaluations publicly.'
volume: 235
URL: https://proceedings.mlr.press/v235/chevalier24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chevalier24a/chevalier24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chevalier24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alexis
family: Chevalier
- given: Jiayi
family: Geng
- given: Alexander
family: Wettig
- given: Howard
family: Chen
- given: Sebastian
family: Mizera
- given: Toni
family: Annala
- given: Max
family: Aragon
- given: Arturo Rodriguez
family: Fanlo
- given: Simon
family: Frieder
- given: Simon
family: Machado
- given: Akshara
family: Prabhakar
- given: Ellie
family: Thieu
- given: Jiachen T.
family: Wang
- given: Zirui
family: Wang
- given: Xindi
family: Wu
- given: Mengzhou
family: Xia
- given: Wenhan
family: Xia
- given: Jiatong
family: Yu
- given: Junjie
family: Zhu
- given: Zhiyong
family: Ren
- given: Sanjeev
family: Arora
- given: Danqi
family: Chen
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8310-8335
id: chevalier24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8310
lastpage: 8335
published: 2024-07-08 00:00:00 +0000
- title: 'Expert Proximity as Surrogate Rewards for Single Demonstration Imitation Learning'
abstract: 'In this paper, we focus on single-demonstration imitation learning (IL), a practical approach for real-world applications where acquiring multiple expert demonstrations is costly or infeasible and the ground truth reward function is not available. In contrast to typical IL settings with multiple demonstrations, single-demonstration IL involves an agent having access to only one expert trajectory. We highlight the issue of sparse reward signals in this setting and propose to mitigate this issue through our proposed Transition Discriminator-based IL (TDIL) method. TDIL is an IRL method designed to address reward sparsity by introducing a denser surrogate reward function that considers environmental dynamics. This surrogate reward function encourages the agent to navigate towards states that are proximal to expert states. In practice, TDIL trains a transition discriminator to differentiate between valid and non-valid transitions in a given environment to compute the surrogate rewards. The experiments demonstrate that TDIL outperforms existing IL approaches and achieves expert-level performance in the single-demonstration IL setting across five widely adopted MuJoCo benchmarks as well as the "Adroit Door" robotic environment.'
volume: 235
URL: https://proceedings.mlr.press/v235/chiang24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chiang24a/chiang24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chiang24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Chia-Cheng
family: Chiang
- given: Li-Cheng
family: Lan
- given: Wei-Fang
family: Sun
- given: Chien
family: Feng
- given: Cho-Jui
family: Hsieh
- given: Chun-Yi
family: Lee
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8336-8358
id: chiang24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8336
lastpage: 8358
published: 2024-07-08 00:00:00 +0000
- title: 'Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference'
abstract: 'Large Language Models (LLMs) have unlocked new capabilities and applications; however, evaluating the alignment with human preferences still poses significant challenges. To address this issue, we introduce Chatbot Arena, an open platform for evaluating LLMs based on human preferences. Our methodology employs a pairwise comparison approach and leverages input from a diverse user base through crowdsourcing. The platform has been operational for several months, amassing over 240K votes. This paper describes the platform, analyzes the data we have collected so far, and explains the tried-and-true statistical methods we are using for efficient and accurate evaluation and ranking of models. We confirm that the crowdsourced questions are sufficiently diverse and discriminating and that the crowd-sourced human votes are in good agreement with those of expert raters. These analyses collectively establish a robust foundation for the credibility of Chatbot Arena. Because of its unique value and openness, Chatbot Arena has emerged as one of the most referenced LLM leaderboards, widely cited by leading LLM developers and companies. The platform is publicly available at https://chat.lmsys.org.'
volume: 235
URL: https://proceedings.mlr.press/v235/chiang24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chiang24b/chiang24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chiang24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Wei-Lin
family: Chiang
- given: Lianmin
family: Zheng
- given: Ying
family: Sheng
- given: Anastasios Nikolas
family: Angelopoulos
- given: Tianle
family: Li
- given: Dacheng
family: Li
- given: Banghua
family: Zhu
- given: Hao
family: Zhang
- given: Michael
family: Jordan
- given: Joseph E.
family: Gonzalez
- given: Ion
family: Stoica
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8359-8388
id: chiang24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8359
lastpage: 8388
published: 2024-07-08 00:00:00 +0000
- title: 'MS-TIP: Imputation Aware Pedestrian Trajectory Prediction'
abstract: 'Pedestrian trajectory prediction aims to predict future trajectories based on observed trajectories. Current state-of-the-art methods often assume that the observed sequences of agents are complete, which is a strong assumption that overlooks inherent uncertainties. Understanding pedestrian behavior when dealing with missing values in the observed sequence is crucial for enhancing the performance of predictive models. In this work, we propose the MultiScale hypergraph for Trajectory Imputation and Prediction (MS-TIP), a novel approach that simultaneously addresses the imputation of missing observations and the prediction of future trajectories. Specifically, we leverage transformers with diagonal masked self-attention to impute incomplete observations. Further, our approach promotes complex interaction modeling through multi-scale hypergraphs, optimizing our trajectory prediction module to capture different types of interactions. With the inclusion of scenic attention, we learn contextual scene information, instead of sole reliance on coordinates. Additionally, our approach utilizes an intermediate control point and refinement module to infer future trajectories accurately. Extensive experiments validate the efficacy of MS-TIP in precisely predicting pedestrian future trajectories. Code is publicly available at https://github.com/Pranav-chib/MS-TIP.'
volume: 235
URL: https://proceedings.mlr.press/v235/chib24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chib24a/chib24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chib24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Pranav Singh
family: Chib
- given: Achintya
family: Nath
- given: Paritosh
family: Kabra
- given: Ishu
family: Gupta
- given: Pravendra
family: Singh
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8389-8402
id: chib24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8389
lastpage: 8402
published: 2024-07-08 00:00:00 +0000
- title: 'Enhancing Trajectory Prediction through Self-Supervised Waypoint Distortion Prediction'
abstract: 'Trajectory prediction is an important task that involves modeling the indeterminate nature of agents to forecast future trajectories given the observed trajectory sequences. The task of predicting trajectories poses significant challenges, as agents not only move individually through time but also interact spatially. The learning of complex spatio-temporal representations stands as a fundamental challenge in trajectory prediction. To this end, we propose a novel approach called SSWDP (Self-Supervised Waypoint Distortion Prediction). We propose a simple yet highly effective self-supervised task of predicting distortion present in the observed trajectories to improve the representation learning of the model. Our approach can complement existing trajectory prediction methods. The experimental results highlight a significant improvement with relative percentage differences of 22.7%/38.9%, 33.8%/36.4%, and 16.60%/23.20% in ADE/FDE for the NBA, TrajNet++, and ETH-UCY datasets, respectively, compared to the baseline methods. Our approach also demonstrates a significant improvement over baseline methods with relative percentage differences of 76.8%/82.5% and 61.0%/36.1% in ADE/FDE for TrajNet++ and NBA datasets in distorted environments, respectively.'
volume: 235
URL: https://proceedings.mlr.press/v235/chib24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chib24b/chib24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chib24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Pranav Singh
family: Chib
- given: Pravendra
family: Singh
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8403-8416
id: chib24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8403
lastpage: 8416
published: 2024-07-08 00:00:00 +0000
- title: 'How Flawed Is ECE? An Analysis via Logit Smoothing'
abstract: 'Informally, a model is calibrated if its predictions are correct with a probability that matches the confidence of the prediction. By far the most common method in the literature for measuring calibration is the expected calibration error (ECE). Recent work, however, has pointed out drawbacks of ECE, such as the fact that it is discontinuous in the space of predictors. In this work, we ask: how fundamental are these issues, and what are their impacts on existing results? Towards this end, we completely characterize the discontinuities of ECE with respect to general probability measures on Polish spaces. We then use the nature of these discontinuities to motivate a novel *continuous, easily estimated* miscalibration metric, which we term *Logit-Smoothed ECE (LS-ECE)*. By comparing the ECE and LS-ECE of pre-trained image classification models, we show in initial experiments that binned ECE closely tracks LS-ECE, indicating that the theoretical pathologies of ECE may be avoidable in practice.'
volume: 235
URL: https://proceedings.mlr.press/v235/chidambaram24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chidambaram24a/chidambaram24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chidambaram24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Muthu
family: Chidambaram
- given: Holden
family: Lee
- given: Colin
family: Mcswiggen
- given: Semon
family: Rezchikov
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8417-8435
id: chidambaram24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8417
lastpage: 8435
published: 2024-07-08 00:00:00 +0000
- title: 'Safe Exploration in Dose Finding Clinical Trials with Heterogeneous Participants'
abstract: 'In drug development, early phase dose-finding clinical trials are carried out to identify an optimal dose to administer to patients in larger confirmatory clinical trials. Standard trial procedures do not optimize for participant benefit and do not consider participant heterogeneity, despite consequences to participants’ health and downstream impacts to under-represented population subgroups. Many novel drugs also do not obey parametric modelling assumptions made in common dose-finding procedures. We present Safe Allocation for Exploration of Treatments SAFE-T, a procedure for adaptive dose-finding that adheres to safety constraints, improves utility for heterogeneous participants, and works well with small sample sizes. SAFE-T flexibly learns non-parametric multi-output Gaussian process models for dose toxicity and efficacy, using Bayesian optimization, and provides accurate final dose recommendations. We provide theoretical guarantees for the satisfaction of safety constraints. Using a comprehensive set of realistic synthetic scenarios, we demonstrate empirically that SAFE-T generally outperforms comparable methods and maintains performance across variations in sample size and subgroup distribution. Finally, we extend SAFE-T to a new adaptive setting, demonstrating its potential to improve traditional clinical trial procedures.'
volume: 235
URL: https://proceedings.mlr.press/v235/chien24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chien24a/chien24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chien24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Isabel
family: Chien
- given: Wessel P
family: Bruinsma
- given: Javier
family: Gonzalez
- given: Richard E.
family: Turner
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8436-8467
id: chien24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8436
lastpage: 8467
published: 2024-07-08 00:00:00 +0000
- title: 'Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts'
abstract: 'Text-to-image diffusion models, e.g. Stable Diffusion (SD), lately have shown remarkable ability in high-quality content generation, and become one of the representatives for the recent wave of transformative AI. Nevertheless, such advance comes with an intensifying concern about the misuse of this generative technology, especially for producing copyrighted or NSFW (i.e. not safe for work) images. Although efforts have been made to filter inappropriate images/prompts or remove undesirable concepts/styles via model fine-tuning, the reliability of these safety mechanisms against diversified problematic prompts remains largely unexplored. In this work, we propose **Prompting4Debugging (P4D)** as a debugging and red-teaming tool that automatically finds problematic prompts for diffusion models to test the reliability of a deployed safety mechanism. We demonstrate the efficacy of our P4D tool in uncovering new vulnerabilities of SD models with safety mechanisms. Particularly, our result shows that around half of prompts in existing safe prompting benchmarks which were originally considered "safe" can actually be manipulated to bypass many deployed safety mechanisms, including concept removal, negative prompt, and safety guidance. Our findings suggest that, without comprehensive testing, the evaluations on limited safe prompting benchmarks can lead to a false sense of safety for text-to-image models.'
volume: 235
URL: https://proceedings.mlr.press/v235/chin24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chin24a/chin24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chin24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zhi-Yi
family: Chin
- given: Chieh Ming
family: Jiang
- given: Ching-Chun
family: Huang
- given: Pin-Yu
family: Chen
- given: Wei-Chen
family: Chiu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8468-8486
id: chin24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8468
lastpage: 8486
published: 2024-07-08 00:00:00 +0000
- title: 'Peeking with PEAK: Sequential, Nonparametric Composite Hypothesis Tests for Means of Multiple Data Streams'
abstract: 'We propose a novel nonparametric sequential test for composite hypotheses for means of multiple data streams. Our proposed method, peeking with expectation-based averaged capital (PEAK), builds upon the testing-by-betting framework and provides a non-asymptotic $\alpha$-level test across any stopping time. Our contributions are two-fold: (1) we propose a novel betting scheme and provide theoretical guarantees on type-I error control, power, and asymptotic growth rate/$e$-power in the setting of a single data stream; (2) we introduce PEAK, a generalization of this betting scheme to multiple streams, that (i) avoids using wasteful union bounds via averaging, (ii) is a test of power one under mild regularity conditions on the sampling scheme of the streams, and (iii) reduces computational overhead when applying the testing-as-betting approaches for pure-exploration bandit problems. We illustrate the practical benefits of PEAK using both synthetic and real-world HeartSteps datasets. Our experiments show that PEAK provides up to an 85% reduction in the number of samples before stopping compared to existing stopping rules for pure-exploration bandit problems, and matches the performance of state-of-the-art sequential tests while improving upon computational complexity.'
volume: 235
URL: https://proceedings.mlr.press/v235/cho24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cho24a/cho24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cho24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Brian M
family: Cho
- given: Kyra
family: Gan
- given: Nathan
family: Kallus
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8487-8509
id: cho24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8487
lastpage: 8509
published: 2024-07-08 00:00:00 +0000
- title: 'Parameterized Physics-informed Neural Networks for Parameterized PDEs'
abstract: 'Complex physical systems are often described by partial differential equations (PDEs) that depend on parameters such as the Raynolds number in fluid mechanics. In applications such as design optimization or uncertainty quantification, solutions of those PDEs need to be evaluated at numerous points in the parameter space. While physics-informed neural networks (PINNs) have emerged as a new strong competitor as a surrogate, their usage in this scenario remains underexplored due to the inherent need for repetitive and time-consuming training. In this paper, we address this problem by proposing a novel extension, parameterized physics-informed neural networks (P$^2$INNs). P$^2$INNs enable modeling the solutions of parameterized PDEs via explicitly encoding a latent representation of PDE parameters. With the extensive empirical evaluation, we demonstrate that P$^2$INNs outperform the baselines both in accuracy and parameter efficiency on benchmark 1D and 2D parameterized PDEs and are also effective in overcoming the known “failure modes”.'
volume: 235
URL: https://proceedings.mlr.press/v235/cho24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cho24b/cho24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cho24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Woojin
family: Cho
- given: Minju
family: Jo
- given: Haksoo
family: Lim
- given: Kookjin
family: Lee
- given: Dongeun
family: Lee
- given: Sanghyun
family: Hong
- given: Noseong
family: Park
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8510-8533
id: cho24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8510
lastpage: 8533
published: 2024-07-08 00:00:00 +0000
- title: 'Kernel Debiased Plug-in Estimation: Simultaneous, Automated Debiasing without Influence Functions for Many Target Parameters'
abstract: 'When estimating target parameters in nonparametric models with nuisance parameters, substituting the unknown nuisances with nonparametric estimators can introduce "plug-in bias." Traditional methods addressing this suboptimal bias-variance trade-off rely on the influence function (IF) of the target parameter. When estimating multiple target parameters, these methods require debiasing the nuisance parameter multiple times using the corresponding IFs, which poses analytical and computational challenges. In this work, we leverage the targeted maximum likelihood estimation (TMLE) framework to propose a novel method named kernel debiased plug-in estimation (KDPE). KDPE refines an initial estimate through regularized likelihood maximization steps, employing a nonparametric model based on reproducing kernel Hilbert spaces. We show that KDPE: (i) simultaneously debiases all pathwise differentiable target parameters that satisfy our regularity conditions, (ii) does not require the IF for implementation, and (iii) remains computationally tractable. We numerically illustrate the use of KDPE and validate our theoretical results.'
volume: 235
URL: https://proceedings.mlr.press/v235/cho24c.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cho24c/cho24c.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cho24c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Brian M
family: Cho
- given: Yaroslav
family: Mukhin
- given: Kyra
family: Gan
- given: Ivana
family: Malenica
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8534-8555
id: cho24c
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8534
lastpage: 8555
published: 2024-07-08 00:00:00 +0000
- title: 'Hard Tasks First: Multi-Task Reinforcement Learning Through Task Scheduling'
abstract: 'Multi-task reinforcement learning (RL) faces the significant challenge of varying task difficulties, often leading to negative transfer when simpler tasks overshadow the learning of more complex ones. To overcome this challenge, we propose a novel algorithm, Scheduled Multi-Task Training (SMT), that strategically prioritizes more challenging tasks, thereby enhancing overall learning efficiency. SMT introduces a dynamic task prioritization strategy, underpinned by an effective metric for assessing task difficulty. This metric ensures an efficient and targeted allocation of training resources, significantly improving learning outcomes. Additionally, SMT incorporates a reset mechanism that periodically reinitializes key network parameters to mitigate the simplicity bias, further enhancing the adaptability and robustness of the learning process across diverse tasks. The efficacy of SMT’s scheduling method is validated by significantly improving performance on challenging Meta-World benchmarks.'
volume: 235
URL: https://proceedings.mlr.press/v235/cho24d.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cho24d/cho24d.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cho24d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Myungsik
family: Cho
- given: Jongeui
family: Park
- given: Suyoung
family: Lee
- given: Youngchul
family: Sung
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8556-8577
id: cho24d
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8556
lastpage: 8577
published: 2024-07-08 00:00:00 +0000
- title: 'KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation'
abstract: 'Large Language Model or LLM inference has two phases, the prompt (or prefill) phase to output the first token and the extension (or decoding) phase to the generate subsequent tokens. In this work, we propose an efficient parallelization scheme, KV-Runahead to accelerate the prompt phase. The key observation is that the extension phase generates tokens faster than the prompt phase because of key-value cache (KV-cache). Hence, KV-Runahead parallelizes the prompt phase by orchestrating multiple processes to populate the KV-cache and minimizes the time-to-first-token (TTFT). Dual-purposing the KV-cache scheme has two main benefits. First, since KV-cache is designed to leverage the causal attention map, we minimize computation and computation automatically. Second, since it already exists for the extension phase, KV-Runahead is easy to implement. We further propose context-level load-balancing to handle uneven KV-cache generation (due to the causal attention) and to optimize TTFT. Compared with an existing parallelization scheme such as tensor or sequential parallelization where keys and values are locally generated and exchanged via all-gather collectives, our experimental results demonstrate that KV-Runahead can offer over 1.4× and 1.6× speedups for Llama 7B and Falcon 7B respectively.'
volume: 235
URL: https://proceedings.mlr.press/v235/cho24e.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cho24e/cho24e.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cho24e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Minsik
family: Cho
- given: Mohammad
family: Rastegari
- given: Devang
family: Naik
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8578-8592
id: cho24e
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8578
lastpage: 8592
published: 2024-07-08 00:00:00 +0000
- title: 'Neurodegenerative Brain Network Classification via Adaptive Diffusion with Temporal Regularization'
abstract: 'Analysis of neurodegenerative diseases on brain connectomes is important in facilitating early diagnosis and predicting its onset. However, investigation of the progressive and irreversible dynamics of these diseases remains underexplored in cross-sectional studies as its diagnostic groups are considered independent. Also, as in many real-world graphs, brain networks exhibit intricate structures with both homophily and heterophily. To address these challenges, we propose Adaptive Graph diffusion network with Temporal regularization (AGT). AGT introduces node-wise convolution to adaptively capture low (i.e., homophily) and high-frequency (i.e., heterophily) characteristics within an optimally tailored range for each node. Moreover, AGT captures sequential variations within progressive diagnostic groups with a novel temporal regularization, considering the relative feature distance between the groups in the latent space. As a result, our proposed model yields interpretable results at both node-level and group-level. The superiority of our method is validated on two neurodegenerative disease benchmarks for graph classification: Alzheimer’s Disease Neuroimaging Initiative (ADNI) and Parkinson’s Progression Markers Initiative (PPMI) datasets.'
volume: 235
URL: https://proceedings.mlr.press/v235/cho24f.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cho24f/cho24f.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cho24f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hyuna
family: Cho
- given: Jaeyoon
family: Sim
- given: Guorong
family: Wu
- given: Won Hwa
family: Kim
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8593-8608
id: cho24f
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8593
lastpage: 8608
published: 2024-07-08 00:00:00 +0000
- title: 'Tilt and Average : Geometric Adjustment of the Last Layer for Recalibration'
abstract: 'After the revelation that neural networks tend to produce overconfident predictions, the problem of calibration, which aims to align confidence with accuracy to enhance the reliability of predictions, has gained significant importance. Several solutions based on calibration maps have been proposed to address the problem of recalibrating a trained classifier using additional datasets. In this paper, we offer an algorithm that transforms the weights of the last layer of the classifier, distinct from the calibration-map-based approach. We concentrate on the geometry of the final linear layer, specifically its angular aspect, and adjust the weights of the corresponding layer. We name the method Tilt and Average, and validate the calibration effect empirically and theoretically. Through this, we demonstrate that our approach, in addition to the existing calibration-map-based techniques, can yield improved calibration performance.'
volume: 235
URL: https://proceedings.mlr.press/v235/cho24g.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cho24g/cho24g.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cho24g.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Gyusang
family: Cho
- given: Chan-Hyun
family: Youn
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8609-8628
id: cho24g
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8609
lastpage: 8628
published: 2024-07-08 00:00:00 +0000
- title: 'Scalable Wasserstein Gradient Flow for Generative Modeling through Unbalanced Optimal Transport'
abstract: 'Wasserstein gradient flow (WGF) describes the gradient dynamics of probability density within the Wasserstein space. WGF provides a promising approach for conducting optimization over the probability distributions. Numerically approximating the continuous WGF requires the time discretization method. The most well-known method for this is the JKO scheme. In this regard, previous WGF models employ the JKO scheme and parametrized transport map for each JKO step. However, this approach results in quadratic training complexity $O(K^2)$ with the number of JKO step $K$. This severely limits the scalability of WGF models. In this paper, we introduce a scalable WGF-based generative model, called Semi-dual JKO (S-JKO). Our model is based on the semi-dual form of the JKO step, derived from the equivalence between the JKO step and the Unbalanced Optimal Transport. Our approach reduces the training complexity to $O(K)$. We demonstrate that our model significantly outperforms existing WGF-based generative models, achieving FID scores of 2.62 on CIFAR-10 and 6.42 on CelebA-HQ-256, which are comparable to state-of-the-art image generative models.'
volume: 235
URL: https://proceedings.mlr.press/v235/choi24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/choi24a/choi24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-choi24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jaemoo
family: Choi
- given: Jaewoong
family: Choi
- given: Myungjoo
family: Kang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8629-8650
id: choi24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8629
lastpage: 8650
published: 2024-07-08 00:00:00 +0000
- title: 'Listwise Reward Estimation for Offline Preference-based Reinforcement Learning'
abstract: 'In Reinforcement Learning (RL), designing precise reward functions remains to be a challenge, particularly when aligning with human intent. Preference-based RL (PbRL) was introduced to address this problem by learning reward models from human feedback. However, existing PbRL methods have limitations as they often overlook the *second-order* preference that indicates the relative strength of preference. In this paper, we propose Listwise Reward Estimation (LiRE), a novel approach for offline PbRL that leverages second-order preference information by constructing a Ranked List of Trajectories (RLT), which can be efficiently built by using the same ternary feedback type as traditional methods. To validate the effectiveness of LiRE, we propose a new offline PbRL dataset that objectively reflects the effect of the estimated rewards. Our extensive experiments on the dataset demonstrate the superiority of LiRE, *i.e.,* outperforming state-of-the-art baselines even with modest feedback budgets and enjoying robustness with respect to the number of feedbacks and feedback noise. Our code is available at https://github.com/chwoong/LiRE'
volume: 235
URL: https://proceedings.mlr.press/v235/choi24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/choi24b/choi24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-choi24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Heewoong
family: Choi
- given: Sangwon
family: Jung
- given: Hongjoon
family: Ahn
- given: Taesup
family: Moon
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8651-8671
id: choi24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8651
lastpage: 8671
published: 2024-07-08 00:00:00 +0000
- title: 'BWS: Best Window Selection Based on Sample Scores for Data Pruning across Broad Ranges'
abstract: 'Data subset selection aims to find a smaller yet informative subset of a large dataset that can approximate the full-dataset training, addressing challenges associated with training neural networks on large-scale datasets. However, existing methods tend to specialize in either high or low selection ratio regimes, lacking a universal approach that consistently achieves competitive performance across a broad range of selection ratios. We introduce a universal and efficient data subset selection method, Best Window Selection (BWS), by proposing a method to choose the best window subset from samples ordered based on their difficulty scores. This approach offers flexibility by allowing the choice of window intervals that span from easy to difficult samples. Furthermore, we provide an efficient mechanism for selecting the best window subset by evaluating its quality using kernel ridge regression. Our experimental results demonstrate the superior performance of BWS compared to other baselines across a broad range of selection ratios over datasets, including CIFAR-10/100 and ImageNet, and the scenarios involving training from random initialization or fine-tuning of pre-trained models.'
volume: 235
URL: https://proceedings.mlr.press/v235/choi24c.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/choi24c/choi24c.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-choi24c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hoyong
family: Choi
- given: Nohyun
family: Ki
- given: Hye Won
family: Chung
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8672-8701
id: choi24c
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8672
lastpage: 8701
published: 2024-07-08 00:00:00 +0000
- title: 'Embodied CoT Distillation From LLM To Off-the-shelf Agents'
abstract: 'We address the challenge of utilizing large language models (LLMs) for complex embodied tasks, in the environment where decision-making systems operate timely on capacity-limited, off-the-shelf devices. We present DeDer, a framework for decomposing and distilling the embodied reasoning capabilities from LLMs to efficient, small language model (sLM)-based policies. In DeDer, the decision-making process of LLM-based strategies is restructured into a hierarchy with a reasoning-policy and planning-policy. The reasoning-policy is distilled from the data that is generated through the embodied in-context learning and self-verification of an LLM, so it can produce effective rationales. The planning-policy, guided by the rationales, can render optimized plans efficiently. In turn, DeDer allows for adopting sLMs for both policies, deployed on off-the-shelf devices. Furthermore, to enhance the quality of intermediate rationales, specific to embodied tasks, we devise the embodied knowledge graph, and to generate multiple rationales timely through a single inference, we also use the contrastively prompted attention model. Our experiments with the ALFRED benchmark demonstrate that DeDer surpasses leading language planning and distillation approaches, indicating the applicability and efficiency of sLM-based embodied policies derived through DeDer.'
volume: 235
URL: https://proceedings.mlr.press/v235/choi24d.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/choi24d/choi24d.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-choi24d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Wonje
family: Choi
- given: Woo Kyung
family: Kim
- given: Minjong
family: Yoo
- given: Honguk
family: Woo
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8702-8721
id: choi24d
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8702
lastpage: 8721
published: 2024-07-08 00:00:00 +0000
- title: 'PICLe: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning'
abstract: 'Large Language Models (LLMs) are trained on massive text corpora, which are encoded with diverse personality traits. This triggers an interesting goal of eliciting a desired personality trait from the LLM, and probing its behavioral preferences. Accordingly, we formalize the persona elicitation task, aiming to customize LLM behaviors to align with a target persona. We present Persona In-Context Learning (PICLe), a novel persona elicitation framework grounded in Bayesian inference. At the core, PICLe introduces a new ICL example selection criterion based on likelihood ratio, which is designed to optimally guide the model in eliciting a specific target persona. We demonstrate the effectiveness of PICLe through extensive comparisons against baseline methods across three contemporary LLMs. Code is available at https://github.com/deeplearning-wisc/picle.'
volume: 235
URL: https://proceedings.mlr.press/v235/choi24e.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/choi24e/choi24e.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-choi24e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hyeong Kyu
family: Choi
- given: Yixuan
family: Li
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8722-8739
id: choi24e
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8722
lastpage: 8739
published: 2024-07-08 00:00:00 +0000
- title: 'PANDA: Expanded Width-Aware Message Passing Beyond Rewiring'
abstract: 'Recent research in the field of graph neural network (GNN) has identified a critical issue known as "over-squashing," resulting from the bottleneck phenomenon in graph structures, which impedes the propagation of long-range information. Prior works have proposed a variety of graph rewiring concepts that aim at optimizing the spatial or spectral properties of graphs to promote the signal propagation. However, such approaches inevitably deteriorate the original graph topology, which may lead to a distortion of information flow. To address this, we introduce an ex**pand**ed width-**a**ware (**PANDA**) message passing, a new message passing paradigm where nodes with high centrality, a potential source of over-squashing, are selectively expanded in width to encapsulate the growing influx of signals from distant nodes. Experimental results show that our method outperforms existing rewiring methods, suggesting that selectively expanding the hidden state of nodes can be a compelling alternative to graph rewiring for addressing the over-squashing.'
volume: 235
URL: https://proceedings.mlr.press/v235/choi24f.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/choi24f/choi24f.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-choi24f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jeongwhan
family: Choi
- given: Sumin
family: Park
- given: Hyowon
family: Wi
- given: Sung-Bae
family: Cho
- given: Noseong
family: Park
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8740-8761
id: choi24f
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8740
lastpage: 8761
published: 2024-07-08 00:00:00 +0000
- title: 'Online bipartite matching with imperfect advice'
abstract: 'We study the problem of online unweighted bipartite matching with $n$ offline vertices and $n$ online vertices where one wishes to be competitive against the optimal offline algorithm. While the classic RANKING algorithm of (Karp et al., 1990) provably attains competitive ratio of $1-1/e > 1/2$, we show that no learning-augmented method can be both 1-consistent and strictly better than 1/2-robust under the adversarial arrival model. Meanwhile, under the random arrival model, we show how one can utilize methods from distribution testing to design an algorithm that takes in external advice about the online vertices and provably achieves competitive ratio interpolating between any ratio attainable by advice-free methods and the optimal ratio of 1, depending on the advice quality.'
volume: 235
URL: https://proceedings.mlr.press/v235/choo24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/choo24a/choo24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-choo24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Davin
family: Choo
- given: Themistoklis
family: Gouleakis
- given: Chun Kai
family: Ling
- given: Arnab
family: Bhattacharyya
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8762-8781
id: choo24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8762
lastpage: 8781
published: 2024-07-08 00:00:00 +0000
- title: 'A connection between Tempering and Entropic Mirror Descent'
abstract: 'This paper explores the connections between tempering (for Sequential Monte Carlo; SMC) and entropic mirror descent to sample from a target probability distribution whose unnormalized density is known. We establish that tempering SMC corresponds to entropic mirror descent applied to the reverse Kullback-Leibler (KL) divergence and obtain convergence rates for the tempering iterates. Our result motivates the tempering iterates from an optimization point of view, showing that tempering can be seen as a descent scheme of the KL divergence with respect to the Fisher-Rao geometry, in contrast to Langevin dynamics that perform descent of the KL with respect to the Wasserstein-2 geometry. We exploit the connection between tempering and mirror descent iterates to justify common practices in SMC and derive adaptive tempering rules that improve over other alternative benchmarks in the literature.'
volume: 235
URL: https://proceedings.mlr.press/v235/chopin24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chopin24a/chopin24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chopin24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Nicolas
family: Chopin
- given: Francesca
family: Crucinio
- given: Anna
family: Korba
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8782-8800
id: chopin24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8782
lastpage: 8800
published: 2024-07-08 00:00:00 +0000
- title: 'Learning Linear Block Error Correction Codes'
abstract: 'Error correction codes are a crucial part of the physical communication layer, ensuring the reliable transfer of data over noisy channels. The design of optimal linear block codes capable of being efficiently decoded is of major concern, especially for short block lengths. While neural decoders have recently demonstrated their advantage over classical decoding techniques, the neural design of the codes remains a challenge. In this work, we propose for the first time a unified encoder-decoder training of binary linear block codes. To this end, we adapt the coding setting to support efficient and differentiable training of the code for end-to-end optimization over the order two Galois field. We also propose a novel Transformer model in which the self-attention masking is performed in a differentiable fashion for the efficient backpropagation of the code gradient. Our results show that (i) the proposed decoder outperforms existing neural decoding on conventional codes, (ii) the suggested framework generates codes that outperform the analogous conventional codes, and (iii) the codes we developed not only excel with our decoder but also show enhanced performance with traditional decoding techniques.'
volume: 235
URL: https://proceedings.mlr.press/v235/choukroun24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/choukroun24a/choukroun24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-choukroun24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yoni
family: Choukroun
- given: Lior
family: Wolf
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8801-8814
id: choukroun24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8801
lastpage: 8814
published: 2024-07-08 00:00:00 +0000
- title: 'A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts'
abstract: 'The sparsely gated mixture of experts (MoE) architecture sends different inputs to different subnetworks (experts), through trainable routers. MoE reduces the training computation significantly for large models, but its deployment can be still memory/computation expensive for some downstream tasks. Model pruning is a popular approach to reduce inference computation, but its application in MoE architecture is largely unexplored. To the best of our knowledge, this paper provides the first provably efficient technique for pruning experts in fine-tuned MoE models. We theoretically prove that prioritizing the pruning of the experts with a smaller change of the router’s $l_2$ norm from the pre-trained model guarantees the preservation of test accuracy, while significantly reducing the model size and the computational requirements. Although our theoretical analysis is centered on binary classification tasks on simplified MoE architecture, our expert pruning method is verified on large vision MoE models such as V-MoE and $\text{E}^3$-MoE fine-tuned on benchmark datasets such as CIFAR-10, CIFAR-100, and ImageNet.'
volume: 235
URL: https://proceedings.mlr.press/v235/chowdhury24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chowdhury24a/chowdhury24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chowdhury24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mohammed Nowaz Rabbani
family: Chowdhury
- given: Meng
family: Wang
- given: Kaoutar
family: El Maghraoui
- given: Naigang
family: Wang
- given: Pin-Yu
family: Chen
- given: Christopher
family: Carothers
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8815-8847
id: chowdhury24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8815
lastpage: 8847
published: 2024-07-08 00:00:00 +0000
- title: 'SPABA: A Single-Loop and Probabilistic Stochastic Bilevel Algorithm Achieving Optimal Sample Complexity'
abstract: 'While stochastic bilevel optimization methods have been extensively studied for addressing large-scale nested optimization problems in machine learning, it remains an open question whether the optimal complexity bounds for solving bilevel optimization are the same as those in single-level optimization. Our main result resolves this question: SPABA, an adaptation of the PAGE method for nonconvex optimization in (Li et al., 2021) to the bilevel setting, can achieve optimal sample complexity in both the finite-sum and expectation settings. We show the optimality of SPABA by proving that there is no gap in complexity analysis between stochastic bilevel and single-level optimization when implementing PAGE. Notably, as indicated by the results of (Dagréou et al., 2022), there might exist a gap in complexity analysis when implementing other stochastic gradient estimators, like SGD and SAGA. In addition to SPABA, we propose several other single-loop stochastic bilevel algorithms, that either match or improve the state-of-the-art sample complexity results, leveraging our convergence rate and complexity analysis. Numerical experiments demonstrate the superior practical performance of the proposed methods.'
volume: 235
URL: https://proceedings.mlr.press/v235/chu24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chu24a/chu24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chu24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tianshu
family: Chu
- given: Dachuan
family: Xu
- given: Wei
family: Yao
- given: Jin
family: Zhang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8848-8903
id: chu24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8848
lastpage: 8903
published: 2024-07-08 00:00:00 +0000
- title: 'How Private are DP-SGD Implementations?'
abstract: 'We demonstrate a substantial gap between the privacy guarantees of the Adaptive Batch Linear Queries (ABLQ) mechanism under different types of batch sampling: (i) Shuffling, and (ii) Poisson subsampling; the typical analysis of Differentially Private Stochastic Gradient Descent (DP-SGD) follows by interpreting it as a post-processing of ABLQ. While shuffling-based DP-SGD is more commonly used in practical implementations, it has not been amenable to easy privacy analysis, either analytically or even numerically. On the other hand, Poisson subsampling-based DP-SGD is challenging to scalably implement, but has a well-understood privacy analysis, with multiple open-source numerically tight privacy accountants available. This has led to a common practice of using shuffling-based DP-SGD in practice, but using the privacy analysis for the corresponding Poisson subsampling version. Our result shows that there can be a substantial gap between the privacy analysis when using the two types of batch sampling, and thus advises caution in reporting privacy parameters for DP-SGD.'
volume: 235
URL: https://proceedings.mlr.press/v235/chua24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chua24a/chua24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chua24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lynn
family: Chua
- given: Badih
family: Ghazi
- given: Pritish
family: Kamath
- given: Ravi
family: Kumar
- given: Pasin
family: Manurangsi
- given: Amer
family: Sinha
- given: Chiyuan
family: Zhang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8904-8918
id: chua24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8904
lastpage: 8918
published: 2024-07-08 00:00:00 +0000
- title: 'Sampling-based Multi-dimensional Recalibration'
abstract: 'Calibration of probabilistic forecasts in the regression setting has been widely studied in the single dimensional case, where the output variables are assumed to be univariate. In many problem settings, however, the output variables are multi-dimensional, and in the presence of dependence across the output dimensions, measuring calibration and performing recalibration for each dimension separately can be both misleading and detrimental. In this work, we focus on representing predictive uncertainties via samples, and propose a recalibration method which accounts for the joint distribution across output dimensions to produce calibrated samples. Based on the concept of highest density regions (HDR), we define the notion of HDR calibration, and show that our recalibration method produces samples which are HDR calibrated. We demonstrate the performance of our method and the quality of the recalibrated samples on a suite of benchmark datasets in multi-dimensional regression, a real-world dataset in modeling plasma dynamics during nuclear fusion reactions, and on a decision-making application in forecasting demand.'
volume: 235
URL: https://proceedings.mlr.press/v235/chung24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chung24a/chung24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chung24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Youngseog
family: Chung
- given: Ian
family: Char
- given: Jeff
family: Schneider
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8919-8940
id: chung24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8919
lastpage: 8940
published: 2024-07-08 00:00:00 +0000
- title: 'Prompt-tuning Latent Diffusion Models for Inverse Problems'
abstract: 'We propose a new method for solving imaging inverse problems using text-to-image latent diffusion models as general priors. Existing methods using latent diffusion models for inverse problems typically rely on simple null text prompts, which can lead to suboptimal performance. To improve upon this, we introduce a method for prompt tuning, which jointly optimizes the text embedding on-the-fly while running the reverse diffusion. This allows us to generate images that are more faithful to the diffusion prior. Specifically, our approach involves a unified optimization framework that simultaneously considers the prompt, latent, and pixel values through alternating minimization. This significantly diminishes image artifacts - a major problem when using latent diffusion models instead of pixel-based diffusion ones. Our method, called P2L, outperforms both pixel- and latent-diffusion model-based inverse problem solvers on a variety of tasks, such as super-resolution, deblurring, and inpainting. Furthermore, P2L demonstrates remarkable scalability to higher resolutions without artifacts.'
volume: 235
URL: https://proceedings.mlr.press/v235/chung24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/chung24b/chung24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-chung24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hyungjin
family: Chung
- given: Jong Chul
family: Ye
- given: Peyman
family: Milanfar
- given: Mauricio
family: Delbracio
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8941-8967
id: chung24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8941
lastpage: 8967
published: 2024-07-08 00:00:00 +0000
- title: 'MusicRL: Aligning Music Generation to Human Preferences'
abstract: 'We propose MusicRL, the first music generation system finetuned from human feedback. Appreciation of text-to-music models is particularly subjective since the concept of musicality as well as the specific intention behind a caption are user-dependent (e.g. a caption such as “upbeat workout music” can map to a retro guitar solo or a technopop beat). Not only this makes supervised training of such models challenging, but it also calls for integrating continuous human feedback in their post-deployment finetuning. MusicRL is a pretrained autoregressive MusicLM model of discrete audio tokens finetuned with reinforcement learning to maximize sequence-level rewards. We design reward functions related specifically to text-adherence and audio quality with the help from selected raters, and use those to finetune MusicLM into MusicRL-R. We deploy MusicLM to users and collect a substantial dataset comprising 300,000 pairwise preferences. Using Reinforcement Learning from Human Feedback (RLHF), we train MusicRL-U, the first text-to-music model that incorporates human feedback at scale. Human evaluations show that both MusicRL-R and MusicRL-U are preferred to the baseline. Ultimately, MusicRL-RU combines the two approaches and results in the best model according to human raters. Ablation studies shed light on the musical attributes influencing human preferences, indicating that text adherence and quality only account for a part of it. This underscores the prevalence of subjectivity in musical appreciation and calls for further involvement of human listeners in the finetuning of music generation models. Samples can be found at google-research.github.io/seanet/musiclm/rlhf/.'
volume: 235
URL: https://proceedings.mlr.press/v235/cideron24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cideron24a/cideron24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cideron24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Geoffrey
family: Cideron
- given: Sertan
family: Girgin
- given: Mauro
family: Verzetti
- given: Damien
family: Vincent
- given: Matej
family: Kastelic
- given: Zalán
family: Borsos
- given: Brian
family: Mcwilliams
- given: Victor
family: Ungureanu
- given: Olivier
family: Bachem
- given: Olivier
family: Pietquin
- given: Matthieu
family: Geist
- given: Leonard
family: Hussenot
- given: Neil
family: Zeghidour
- given: Andrea
family: Agostinelli
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8968-8984
id: cideron24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8968
lastpage: 8984
published: 2024-07-08 00:00:00 +0000
- title: 'Graph-based Time Series Clustering for End-to-End Hierarchical Forecasting'
abstract: 'Relationships among time series can be exploited as inductive biases in learning effective forecasting models. In hierarchical time series, relationships among subsets of sequences induce hard constraints (hierarchical inductive biases) on the predicted values. In this paper, we propose a graph-based methodology to unify relational and hierarchical inductive biases in the context of deep learning for time series forecasting. In particular, we model both types of relationships as dependencies in a pyramidal graph structure, with each pyramidal layer corresponding to a level of the hierarchy. By exploiting modern - trainable - graph pooling operators we show that the hierarchical structure, if not available as a prior, can be learned directly from data, thus obtaining cluster assignments aligned with the forecasting objective. A differentiable reconciliation stage is incorporated into the processing architecture, allowing hierarchical constraints to act both as an architectural bias as well as a regularization element for predictions. Simulation results on representative datasets show that the proposed method compares favorably against the state of the art.'
volume: 235
URL: https://proceedings.mlr.press/v235/cini24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cini24a/cini24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cini24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Andrea
family: Cini
- given: Danilo
family: Mandic
- given: Cesare
family: Alippi
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 8985-8999
id: cini24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 8985
lastpage: 8999
published: 2024-07-08 00:00:00 +0000
- title: 'Studying K-FAC Heuristics by Viewing Adam through a Second-Order Lens'
abstract: 'Research into optimisation for deep learning is characterised by a tension between the computational efficiency of first-order, gradient-based methods (such as SGD and Adam) and the theoretical efficiency of second-order, curvature-based methods (such as quasi-Newton methods and K-FAC). Noting that second-order methods often only function effectively with the addition of stabilising heuristics (such as Levenberg-Marquardt damping), we ask how much these (as opposed to the second-order curvature model) contribute to second-order algorithms’ performance. We thus study *AdamQLR*: an optimiser combining damping and learning rate selection techniques from K-FAC (Martens & Grosse, 2015) with the update directions proposed by Adam, inspired by considering Adam through a second-order lens. We evaluate AdamQLR on a range of regression and classification tasks at various scales and hyperparameter tuning methodologies, concluding K-FAC’s adaptive heuristics are of variable standalone general effectiveness, and finding an *untuned* AdamQLR setting can achieve comparable performance vs runtime to *tuned* benchmarks.'
volume: 235
URL: https://proceedings.mlr.press/v235/clarke24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/clarke24a/clarke24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-clarke24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ross M
family: Clarke
- given: José Miguel
family: Hernández-Lobato
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9000-9032
id: clarke24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9000
lastpage: 9032
published: 2024-07-08 00:00:00 +0000
- title: '$\mathttVITS$ : Variational Inference Thompson Sampling for contextual bandits'
abstract: 'In this paper, we introduce and analyze a variant of the Thompson sampling (TS) algorithm for contextual bandits. At each round, traditional TS requires samples from the current posterior distribution, which is usually intractable. To circumvent this issue, approximate inference techniques can be used and provide samples with distribution close to the posteriors. However, current approximate techniques yield to either poor estimation (Laplace approximation) or can be computationally expensive (MCMC methods, Ensemble sampling...). In this paper, we propose a new algorithm, Varational Inference TS $\mathtt{VITS}$, based on Gaussian Variational Inference. This scheme provides powerful posterior approximations which are easy to sample from, and is computationally efficient, making it an ideal choice for TS. In addition, we show that $\mathtt{VITS}$ achieves a sub-linear regret bound of the same order in the dimension and number of round as traditional TS for linear contextual bandit. Finally, we demonstrate experimentally the effectiveness of $\mathtt{VITS}$ on both synthetic and real world datasets'
volume: 235
URL: https://proceedings.mlr.press/v235/clavier24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/clavier24a/clavier24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-clavier24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Pierre
family: Clavier
- given: Tom
family: Huix
- given: Alain
family: Oliviero Durmus
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9033-9075
id: clavier24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9033
lastpage: 9075
published: 2024-07-08 00:00:00 +0000
- title: 'CogBench: a large language model walks into a psychology lab'
abstract: 'Large language models (LLMs) have significantly advanced the field of artificial intelligence. Yet, evaluating them comprehensively remains challenging. We argue that this is partly due to the predominant focus on performance metrics in most benchmarks. This paper introduces *CogBench*, a benchmark that includes ten behavioral metrics derived from seven cognitive psychology experiments. This novel approach offers a toolkit for phenotyping LLMs’ behavior. We apply *CogBench* to 40 LLMs, yielding a rich and diverse dataset. We analyze this data using statistical multilevel modeling techniques, accounting for the nested dependencies among fine-tuned versions of specific LLMs. Our study highlights the crucial role of model size and reinforcement learning from human feedback (RLHF) in improving performance and aligning with human behavior. Interestingly, we find that open-source models are less risk-prone than proprietary models and that fine-tuning on code does not necessarily enhance LLMs’ behavior. Finally, we explore the effects of prompt-engineering techniques. We discover that chain-of-thought prompting improves probabilistic reasoning, while take-a-step-back prompting fosters model-based behaviors.'
volume: 235
URL: https://proceedings.mlr.press/v235/coda-forno24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/coda-forno24a/coda-forno24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-coda-forno24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Julian
family: Coda-Forno
- given: Marcel
family: Binz
- given: Jane X
family: Wang
- given: Eric
family: Schulz
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9076-9108
id: coda-forno24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9076
lastpage: 9108
published: 2024-07-08 00:00:00 +0000
- title: 'Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices'
abstract: 'Text-to-image (T2I) diffusion models achieve state-of-the-art results in image synthesis and editing. However, leveraging such pre-trained models for video editing is considered a major challenge. Many existing works attempt to enforce temporal consistency in the edited video through explicit correspondence mechanisms, either in pixel space or between deep features. These methods, however, struggle with strong nonrigid motion. In this paper, we introduce a fundamentally different approach, which is based on the observation that spatiotemporal slices of natural videos exhibit similar characteristics to natural images. Thus, the same T2I diffusion model that is normally used only as a prior on video frames, can also serve as a strong prior for enhancing temporal consistency by applying it on spatiotemporal slices. Based on this observation, we present Slicedit, a method for text-based video editing that utilizes a pre-trained T2I diffusion model to process both spatial and spatiotemporal slices. Our method generates videos that retain the structure and motion of the original video while adhering to the target text. Through extensive experiments, we demonstrate Slicedit’s ability to edit a wide range of real-world videos, confirming its clear advantages compared to existing baselines.'
volume: 235
URL: https://proceedings.mlr.press/v235/cohen24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cohen24a/cohen24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cohen24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Nathaniel
family: Cohen
- given: Vladimir
family: Kulikov
- given: Matan
family: Kleiner
- given: Inbar
family: Huberman-Spiegelglas
- given: Tomer
family: Michaeli
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9109-9137
id: cohen24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9109
lastpage: 9137
published: 2024-07-08 00:00:00 +0000
- title: 'Improving Token-Based World Models with Parallel Observation Prediction'
abstract: 'Motivated by the success of Transformers when applied to sequences of discrete symbols, token-based world models (TBWMs) were recently proposed as sample-efficient methods. In TBWMs, the world model consumes agent experience as a language-like sequence of tokens, where each observation constitutes a sub-sequence. However, during imagination, the sequential token-by-token generation of next observations results in a severe bottleneck, leading to long training times, poor GPU utilization, and limited representations. To resolve this bottleneck, we devise a novel Parallel Observation Prediction (POP) mechanism. POP augments a Retentive Network (RetNet) with a novel forward mode tailored to our reinforcement learning setting. We incorporate POP in a novel TBWM agent named REM (Retentive Environment Model), showcasing a 15.4x faster imagination compared to prior TBWMs. REM attains superhuman performance on 12 out of 26 games of the Atari 100K benchmark, while training in less than 12 hours. Our code is available at https://github.com/leor-c/REM'
volume: 235
URL: https://proceedings.mlr.press/v235/cohen24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cohen24b/cohen24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cohen24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lior
family: Cohen
- given: Kaixin
family: Wang
- given: Bingyi
family: Kang
- given: Shie
family: Mannor
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9138-9160
id: cohen24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9138
lastpage: 9160
published: 2024-07-08 00:00:00 +0000
- title: 'Perturb-and-Project: Differentially Private Similarities and Marginals'
abstract: 'We revisit the objective perturbations framework for differential privacy where noise is added to the input $A\in \mathcal{S}$ and the result is then projected back to the space of admissible datasets $\mathcal{S}$. Through this framework, we first design novel efficient algorithms to privately release pair-wise cosine similarities. Second, we derive a novel algorithm to compute $k$-way marginal queries over $n$ features. Prior work could achieve comparable guarantees only for $k$ even. Furthermore, we extend our results to $t$-sparse datasets, where our efficient algorithms yields novel, stronger guarantees whenever $t\le n^{5/6}/\log n.$ Finally, we provide a theoretical perspective on why *fast* input perturbation algorithms works well in practice. The key technical ingredients behind our results are tight sum-of-squares certificates upper bounding the Gaussian complexity of sets of solutions.'
volume: 235
URL: https://proceedings.mlr.press/v235/cohen-addad24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cohen-addad24a/cohen-addad24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cohen-addad24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Vincent
family: Cohen-Addad
- given: Tommaso
family: D’Orsi
- given: Alessandro
family: Epasto
- given: Vahab
family: Mirrokni
- given: Peilin
family: Zhong
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9161-9179
id: cohen-addad24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9161
lastpage: 9179
published: 2024-07-08 00:00:00 +0000
- title: 'Multi-View Stochastic Block Models'
abstract: 'Graph clustering is a central topic in unsupervised learning with a multitude of practical applications. In recent years, multi-view graph clustering has gained a lot of attention for its applicability to real-world instances where one often has access to multiple data sources. In this paper we formalize a new family of models, called *multi-view stochastic block models* that capture this setting. For this model, we first study efficient algorithms that naively work on the union of multiple graphs. Then, we introduce a new efficient algorithm that provably outperforms previous approaches by analyzing the structure of each graph separately. Finally, we complement our results with an information-theoretic lower bound studying the limits of what can be done in this model.'
volume: 235
URL: https://proceedings.mlr.press/v235/cohen-addad24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cohen-addad24b/cohen-addad24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cohen-addad24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Vincent
family: Cohen-Addad
- given: Tommaso
family: D’Orsi
- given: Silvio
family: Lattanzi
- given: Rajai
family: Nasser
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9180-9207
id: cohen-addad24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9180
lastpage: 9207
published: 2024-07-08 00:00:00 +0000
- title: 'A Near-Linear Time Approximation Algorithm for Beyond-Worst-Case Graph Clustering'
abstract: 'We consider the semi-random graph model of [Makarychev, Makarychev and Vijayaraghavan, STOC’12], where, given a random bipartite graph with $\alpha$ edges and an unknown bipartition $(A, B)$ of the vertex set, an adversary can add arbitrary edges inside each community and remove arbitrary edges from the cut $(A, B)$ (i.e. all adversarial changes are *monotone* with respect to the bipartition). For this model, a polynomial time algorithm [MMV’12] is known to approximate the Balanced Cut problem up to value $O(\alpha)$ as long as the cut $(A, B)$ has size $\Omega(\alpha)$. However, it consists of slow subroutines requiring optimal solutions for logarithmically many semidefinite programs. We study the fine-grained complexity of the problem and present the first near-linear time algorithm that achieves similar performances to that of [MMV’12]. Our algorithm runs in time $O(|V(G)|^{1+o(1)} + |E(G)|^{1+o(1)})$ and finds a balanced cut of value $O(\alpha).$ Our approach appears easily extendible to related problem, such as Sparsest Cut, and also yields an near-linear time $O(1)$-approximation to Dagupta’s objective function for hierarchical clustering [Dasgupta, STOC’16] for the semi-random hierarchical stochastic block model inputs of [Cohen-Addad, Kanade, Mallmann-Trenn, Mathieu, JACM’19].'
volume: 235
URL: https://proceedings.mlr.press/v235/cohen-addad24c.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cohen-addad24c/cohen-addad24c.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cohen-addad24c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Vincent
family: Cohen-Addad
- given: Tommaso
family: D’Orsi
- given: Aida
family: Mousavifar
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9208-9229
id: cohen-addad24c
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9208
lastpage: 9229
published: 2024-07-08 00:00:00 +0000
- title: 'Dynamic Correlation Clustering in Sublinear Update Time'
abstract: 'We study the classic problem of correlation clustering in dynamic vertex streams. In this setting, vertices are either added or randomly deleted over time, and each vertex pair is connected by a positive or negative edge. The objective is to continuously find a partition which minimizes the sum of positive edges crossing clusters and negative edges within clusters. We present an algorithm that maintains an $O(1)$-approximation with $O(\text{polylog} n)$ amortized update time. Prior to our work Behnezhad et al. in SODA 2023 achieved a $5$-approximation with $O(1)$ expected update time in edge streams which translates in vertex streams to an $O(D)$-update time where $D$ is the maximum possible degree. Finally we complement our theoretical analysis with experiments on real world data.'
volume: 235
URL: https://proceedings.mlr.press/v235/cohen-addad24d.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cohen-addad24d/cohen-addad24d.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cohen-addad24d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Vincent
family: Cohen-Addad
- given: Silvio
family: Lattanzi
- given: Andreas
family: Maggiori
- given: Nikos
family: Parotsidis
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9230-9274
id: cohen-addad24d
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9230
lastpage: 9274
published: 2024-07-08 00:00:00 +0000
- title: 'A2Q+: Improving Accumulator-Aware Weight Quantization'
abstract: 'Quantization techniques commonly reduce the inference costs of neural networks by restricting the precision of weights and activations. Recent studies show that also reducing the precision of the accumulator can further improve hardware efficiency at the risk of numerical overflow, which introduces arithmetic errors that can degrade model accuracy. To avoid numerical overflow while maintaining accuracy, recent work proposed accumulator-aware quantization (A2Q)—a quantization-aware training method that constrains model weights during training to safely use a target accumulator bit width during inference. Although this shows promise, we demonstrate that A2Q relies on an overly restrictive constraint and a sub-optimal weight initialization strategy that each introduce superfluous quantization error. To address these shortcomings, we introduce: (1) an improved bound that alleviates accumulator constraints without compromising overflow avoidance; and (2) a new strategy for initializing quantized weights from pre-trained floating-point checkpoints. We combine these contributions with weight normalization to introduce A2Q+. We identify and characterize the various tradeoffs that arise as a consequence of accumulator constraints and support our analysis with experiments that show A2Q+ significantly improves these trade-offs when compared to prior methods.'
volume: 235
URL: https://proceedings.mlr.press/v235/colbert24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/colbert24a/colbert24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-colbert24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ian
family: Colbert
- given: Alessandro
family: Pappalardo
- given: Jakoba
family: Petri-Koenig
- given: Yaman
family: Umuroglu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9275-9291
id: colbert24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9275
lastpage: 9291
published: 2024-07-08 00:00:00 +0000
- title: 'Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks'
abstract: 'An increasingly popular machine learning paradigm is to pretrain a neural network (NN) on many tasks offline, then adapt it to downstream tasks, often by re-training only the last linear layer of the network. This approach yields strong downstream performance in a variety of contexts, demonstrating that multitask pretraining leads to effective feature learning. Although several recent theoretical studies have shown that shallow NNs learn meaningful features when either (i) they are trained on a *single* task or (ii) they are *linear*, very little is known about the closer-to-practice case of *nonlinear* NNs trained on *multiple* tasks. In this work, we present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks. Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks. Using this observation, we show that when the tasks are binary classification tasks with labels depending on the projection of the data onto an $r$-dimensional subspace within the $d\gg r$-dimensional input space, a simple gradient-based multitask learning algorithm on a two-layer ReLU NN recovers this projection, allowing for generalization to downstream tasks with sample and neuron complexity independent of $d$. In contrast, we show that with high probability over the draw of a single task, training on this single task cannot guarantee to learn all $r$ ground-truth features.'
volume: 235
URL: https://proceedings.mlr.press/v235/collins24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/collins24a/collins24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-collins24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Liam
family: Collins
- given: Hamed
family: Hassani
- given: Mahdi
family: Soltanolkotabi
- given: Aryan
family: Mokhtari
- given: Sanjay
family: Shakkottai
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9292-9345
id: collins24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9292
lastpage: 9345
published: 2024-07-08 00:00:00 +0000
- title: 'Position: Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback'
abstract: 'Foundation models such as GPT-4 are fine-tuned to avoid unsafe or otherwise problematic behavior, such as helping to commit crimes or producing racist text. One approach to fine-tuning, called reinforcement learning from human feedback, learns from humans’ expressed preferences over multiple outputs. Another approach is constitutional AI, in which the input from humans is a list of high-level principles. But how do we deal with potentially diverging input from humans? How can we aggregate the input into consistent data about “collective” preferences or otherwise use it to make collective choices about model behavior? In this paper, we argue that the field of social choice is well positioned to address these questions, and we discuss ways forward for this agenda, drawing on discussions in a recent workshop on Social Choice for AI Ethics and Safety held in Berkeley, CA, USA in December 2023.'
volume: 235
URL: https://proceedings.mlr.press/v235/conitzer24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/conitzer24a/conitzer24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-conitzer24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Vincent
family: Conitzer
- given: Rachel
family: Freedman
- given: Jobst
family: Heitzig
- given: Wesley H.
family: Holliday
- given: Bob M.
family: Jacobs
- given: Nathan
family: Lambert
- given: Milan
family: Mosse
- given: Eric
family: Pacuit
- given: Stuart
family: Russell
- given: Hailey
family: Schoelkopf
- given: Emanuel
family: Tewolde
- given: William S.
family: Zwicker
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9346-9360
id: conitzer24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9346
lastpage: 9360
published: 2024-07-08 00:00:00 +0000
- title: 'Statistical Inference Under Constrained Selection Bias'
abstract: 'Large-scale datasets are increasingly being used to inform decision making. While this effort aims to ground policy in real-world evidence, challenges have arisen as selection bias and other forms of distribution shifts often plague observational data. Previous attempts to provide robust inference have given guarantees depending on a user-specified amount of possible distribution shift (e.g., the maximum KL divergence between the observed and target distributions). However, decision makers will often have additional knowledge about the target distribution which constrains the kind of possible shifts. To leverage such information, we propose a framework that enables statistical inference in the presence of selection bias which obeys user-specified constraints in the form of functions whose expectation is known under the target distribution. The output is high-probability bounds on the value of an estimand for the target distribution. Hence, our method leverages domain knowledge in order to partially identify a wide class of estimands. We analyze the computational and statistical properties of methods to estimate these bounds and show that our method can produce informative bounds on a variety of simulated and semisynthetic tasks, as well as in a real-world use case.'
volume: 235
URL: https://proceedings.mlr.press/v235/cortes-gomez24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cortes-gomez24a/cortes-gomez24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cortes-gomez24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Santiago
family: Cortes-Gomez
- given: Mateo
family: Dulce Rubio
- given: Carlos Miguel
family: Patiño
- given: Bryan
family: Wilder
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9361-9379
id: cortes-gomez24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9361
lastpage: 9379
published: 2024-07-08 00:00:00 +0000
- title: 'Scaling Laws for the Value of Individual Data Points in Machine Learning'
abstract: 'Recent works have shown that machine learning models improve at a predictable rate with the amount of training data, leading to scaling laws that describe the relationship between error and dataset size. These scaling laws can help determine a model’s training dataset, but they take an aggregate view of the data by only considering the dataset’s size. We consider a new perspective by investigating scaling behavior for the value of individual data points: we find that a data point’s contribution to model’s performance shrinks predictably with the size of the dataset in a log-linear manner. Interestingly, there is significant variability in the scaling exponent among different data points, indicating that certain points are more valuable in small datasets and other points are relatively more useful as a part of large datasets. We provide learning theory support for our scaling laws and we observe empirically that it holds across several model classes. We further propose a maximum likelihood estimator and an amortized estimator to efficiently learn the individualized scaling behaviors from a small number of noisy observations per data point. Using our efficient estimators, we provide insights into factors that influence the scaling behavior of different data points. Finally we demonstrate applications of the individualized scaling laws to data valuation and data subset selection.'
volume: 235
URL: https://proceedings.mlr.press/v235/covert24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/covert24a/covert24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-covert24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ian Connick
family: Covert
- given: Wenlong
family: Ji
- given: Tatsunori
family: Hashimoto
- given: James
family: Zou
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9380-9406
id: covert24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9380
lastpage: 9406
published: 2024-07-08 00:00:00 +0000
- title: 'Time Series Diffusion in the Frequency Domain'
abstract: 'Fourier analysis has been an instrumental tool in the development of signal processing. This leads us to wonder whether this framework could similarly benefit generative modelling. In this paper, we explore this question through the scope of time series diffusion models. More specifically, we analyze whether representing time series in the frequency domain is a useful inductive bias for score-based diffusion models. By starting from the canonical SDE formulation of diffusion in the time domain, we show that a dual diffusion process occurs in the frequency domain with an important nuance: Brownian motions are replaced by what we call mirrored Brownian motions, characterized by mirror symmetries among their components. Building on this insight, we show how to adapt the denoising score matching approach to implement diffusion models in the frequency domain. This results in frequency diffusion models, which we compare to canonical time diffusion models. Our empirical evaluation on real-world datasets, covering various domains like healthcare and finance, shows that frequency diffusion models better capture the training distribution than time diffusion models. We explain this observation by showing that time series from these datasets tend to be more localized in the frequency domain than in the time domain, which makes them easier to model in the former case. All our observations point towards impactful synergies between Fourier analysis and diffusion models.'
volume: 235
URL: https://proceedings.mlr.press/v235/crabbe24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/crabbe24a/crabbe24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-crabbe24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jonathan
family: Crabbé
- given: Nicolas
family: Huynh
- given: Jan Pawel
family: Stanczuk
- given: Mihaela
family: Van Der Schaar
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9407-9438
id: crabbe24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9407
lastpage: 9438
published: 2024-07-08 00:00:00 +0000
- title: 'Conformal Prediction Sets Improve Human Decision Making'
abstract: 'In response to everyday queries, humans explicitly signal uncertainty and offer alternative answers when they are unsure. Machine learning models that output calibrated prediction sets through conformal prediction mimic this human behaviour; larger sets signal greater uncertainty while providing alternatives. In this work, we study the usefulness of conformal prediction sets as an aid for human decision making by conducting a pre-registered randomized controlled trial with conformal prediction sets provided to human subjects. With statistical significance, we find that when humans are given conformal prediction sets their accuracy on tasks improves compared to fixed-size prediction sets with the same coverage guarantee. The results show that quantifying model uncertainty with conformal prediction is helpful for human-in-the-loop decision making and human-AI teams.'
volume: 235
URL: https://proceedings.mlr.press/v235/cresswell24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cresswell24a/cresswell24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cresswell24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jesse C.
family: Cresswell
- given: Yi
family: Sui
- given: Bhargava
family: Kumar
- given: Noël
family: Vouitsis
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9439-9457
id: cresswell24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9439
lastpage: 9457
published: 2024-07-08 00:00:00 +0000
- title: 'Agent Instructs Large Language Models to be General Zero-Shot Reasoners'
abstract: 'We introduce a method to improve the zero-shot reasoning abilities of large language models on general language understanding tasks. Specifically, we build an autonomous agent to instruct the reasoning process of large language models. To enable this, our agent only needs to generate a single set of instructions for each task. These instructions turn out to be extremely effective for improving the reasoning process of different large language models across all task instances. We show this approach further unleashes the zero-shot reasoning abilities of large language models to more tasks. We study the performance of our method on a wide set of datasets spanning generation, classification, and reasoning. We show that our method generalizes to most tasks and obtains state-of-the-art zero-shot performance on 20 of the 29 datasets that we evaluate. For instance, our method boosts the performance of state-of-the-art large language models by a large margin, including Vicuna-13b, Llama-2-70b-chat, and GPT-3.5 Turbo. Compared to zero-shot chain of thought, our improvement in reasoning is striking. With our method, Llama-2-70b-chat outperforms zero-shot GPT-3.5 Turbo significantly.'
volume: 235
URL: https://proceedings.mlr.press/v235/crispino24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/crispino24a/crispino24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-crispino24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Nicholas
family: Crispino
- given: Kyle
family: Montgomery
- given: Fankun
family: Zeng
- given: Dawn
family: Song
- given: Chenguang
family: Wang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9458-9549
id: crispino24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9458
lastpage: 9549
published: 2024-07-08 00:00:00 +0000
- title: 'Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers'
abstract: 'We present the Hourglass Diffusion Transformer (HDiT), an image-generative model that exhibits linear scaling with pixel count, supporting training at high resolution (e.g. $1024 \times 1024$) directly in pixel-space. Building on the Transformer architecture, which is known to scale to billions of parameters, it bridges the gap between the efficiency of convolutional U-Nets and the scalability of Transformers. HDiT trains successfully without typical high-resolution training techniques such as multiscale architectures, latent autoencoders or self-conditioning. We demonstrate that HDiT performs competitively with existing models on ImageNet $256^2$, and sets a new state-of-the-art for diffusion models on FFHQ-$1024^2$. Code is available at https://github.com/crowsonkb/k-diffusion.'
volume: 235
URL: https://proceedings.mlr.press/v235/crowson24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/crowson24a/crowson24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-crowson24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Katherine
family: Crowson
- given: Stefan Andreas
family: Baumann
- given: Alex
family: Birch
- given: Tanishq Mathew
family: Abraham
- given: Daniel Z
family: Kaplan
- given: Enrico
family: Shippole
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9550-9575
id: crowson24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9550
lastpage: 9575
published: 2024-07-08 00:00:00 +0000
- title: 'Generalization Bounds for Causal Regression: Insights, Guarantees and Sensitivity Analysis'
abstract: 'Many algorithms have been recently proposed for causal machine learning. Yet, there is little to no theory on their quality, especially considering finite samples. In this work, we propose a theory based on generalization bounds that provides such guarantees. By introducing a novel change-of-measure inequality, we are able to tightly bound the model loss in terms of the deviation of the treatment propensities over the population, which we show can be empirically limited. Our theory is fully rigorous and holds even in the face of hidden confounding and violations of positivity. We demonstrate our bounds on semi-synthetic and real data, showcasing their remarkable tightness and practical utility.'
volume: 235
URL: https://proceedings.mlr.press/v235/csillag24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/csillag24a/csillag24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-csillag24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Daniel
family: Csillag
- given: Claudio Jose
family: Struchiner
- given: Guilherme Tegoni
family: Goedert
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9576-9602
id: csillag24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9576
lastpage: 9602
published: 2024-07-08 00:00:00 +0000
- title: 'Major-Minor Mean Field Multi-Agent Reinforcement Learning'
abstract: 'Multi-agent reinforcement learning (MARL) remains difficult to scale to many agents. Recent MARL using Mean Field Control (MFC) provides a tractable and rigorous approach to otherwise difficult cooperative MARL. However, the strict MFC assumption of many independent, weakly-interacting agents is too inflexible in practice. We generalize MFC to instead simultaneously model many similar and few complex agents – as Major-Minor Mean Field Control (M3FC). Theoretically, we give approximation results for finite agent control, and verify the sufficiency of stationary policies for optimality together with a dynamic programming principle. Algorithmically, we propose Major-Minor Mean Field MARL (M3FMARL) for finite agent systems instead of the limiting system. The algorithm is shown to approximate the policy gradient of the underlying M3FC MDP. Finally, we demonstrate its capabilities experimentally in various scenarios. We observe a strong performance in comparison to state-of-the-art policy gradient MARL methods.'
volume: 235
URL: https://proceedings.mlr.press/v235/cui24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cui24a/cui24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cui24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Kai
family: Cui
- given: Christian
family: Fabian
- given: Anam
family: Tahir
- given: Heinz
family: Koeppl
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9603-9632
id: cui24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9603
lastpage: 9632
published: 2024-07-08 00:00:00 +0000
- title: 'Learning Latent Space Hierarchical EBM Diffusion Models'
abstract: 'This work studies the learning problem of the energy-based prior model and the multi-layer generator model. The multi-layer generator model, which contains multiple layers of latent variables organized in a top-down hierarchical structure, typically assumes the Gaussian prior model. Such a prior model can be limited in modelling expressivity, which results in a gap between the generator posterior and the prior model, known as the prior hole problem. Recent works have explored learning the energy-based (EBM) prior model as a second-stage, complementary model to bridge the gap. However, the EBM defined on a multi-layer latent space can be highly multi-modal, which makes sampling from such marginal EBM prior challenging in practice, resulting in ineffectively learned EBM. To tackle the challenge, we propose to leverage the diffusion probabilistic scheme to mitigate the burden of EBM sampling and thus facilitate EBM learning. Our extensive experiments demonstrate a superior performance of our diffusion-learned EBM prior on various challenging tasks.'
volume: 235
URL: https://proceedings.mlr.press/v235/cui24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cui24b/cui24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cui24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jiali
family: Cui
- given: Tian
family: Han
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9633-9645
id: cui24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9633
lastpage: 9645
published: 2024-07-08 00:00:00 +0000
- title: 'Harmonizing Generalization and Personalization in Federated Prompt Learning'
abstract: 'Federated Prompt Learning (FPL) incorporates large pre-trained Vision-Language models (VLM) into federated learning through prompt tuning. The transferable representations and remarkable generalization capacity of VLM make them highly compatible with the integration of federated learning. Addressing data heterogeneity in federated learning requires personalization, but excessive focus on it across clients could compromise the model’s ability to generalize effectively. To preserve the impressive generalization capability of VLM, it is crucial to strike a balance between personalization and generalization in FPL. To tackle this challenge, we proposed Federated Prompt Learning with CLIP Generalization and low-rank Personalization (FedPGP), which employs pre-trained CLIP to provide knowledge-guidance on the global prompt for improved generalization and incorporates a low-rank adaptation term to personalize the global prompt. Further, FedPGP integrates a prompt-wise contrastive loss to achieve knowledge guidance and personalized adaptation simultaneously, enabling a harmonious balance between personalization and generalization in FPL. We conduct extensive experiments on various datasets to explore base-to-novel generalization in both category-level and domain-level scenarios with heterogeneous data, showing the superiority of FedPGP in balancing generalization and personalization.'
volume: 235
URL: https://proceedings.mlr.press/v235/cui24c.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cui24c/cui24c.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cui24c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tianyu
family: Cui
- given: Hongxia
family: Li
- given: Jingya
family: Wang
- given: Ye
family: Shi
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9646-9661
id: cui24c
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9646
lastpage: 9661
published: 2024-07-08 00:00:00 +0000
- title: 'Asymptotics of feature learning in two-layer networks after one gradient-step'
abstract: 'In this manuscript, we investigate the problem of how two-layer neural networks learn features from data, and improve over the kernel regime, after being trained with a single gradient descent step. Leveraging the insight from (Ba et al., 2022), we model the trained network by a spiked Random Features (sRF) model. Further building on recent progress on Gaussian universality (Dandi et al., 2023), we provide an exact asymptotic description of the generalization error of the sRF in the high-dimensional limit where the number of samples, the width, and the input dimension grow at a proportional rate. The resulting characterization for sRFs also captures closely the learning curves of the original network model. This enables us to understand how adapting to the data is crucial for the network to efficiently learn non-linear functions in the direction of the gradient - where at initialization it can only express linear functions in this regime.'
volume: 235
URL: https://proceedings.mlr.press/v235/cui24d.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cui24d/cui24d.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cui24d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hugo
family: Cui
- given: Luca
family: Pesce
- given: Yatin
family: Dandi
- given: Florent
family: Krzakala
- given: Yue
family: Lu
- given: Lenka
family: Zdeborova
- given: Bruno
family: Loureiro
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9662-9695
id: cui24d
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9662
lastpage: 9695
published: 2024-07-08 00:00:00 +0000
- title: 'Ameliorate Spurious Correlations in Dataset Condensation'
abstract: 'Dataset Condensation has emerged as a technique for compressing large datasets into smaller synthetic counterparts, facilitating downstream training tasks. In this paper, we study the impact of bias inside the original dataset on the performance of dataset condensation. With a comprehensive empirical evaluation on canonical datasets with color, corruption and background biases, we found that color and background biases in the original dataset will be amplified through the condensation process, resulting in a notable decline in the performance of models trained on the condensed dataset, while corruption bias is suppressed through the condensation process. To reduce bias amplification in dataset condensation, we introduce a simple yet highly effective approach based on a sample reweighting scheme utilizing kernel density estimation. Empirical results on multiple real-world and synthetic datasets demonstrate the effectiveness of the proposed method. Notably, on CMNIST with 5% bias-conflict ratio and IPC 50, our method achieves 91.5% test accuracy compared to 23.8% from vanilla DM, boosting the performance by 67.7%, whereas applying state-of-the-art debiasing method on the same dataset only achieves 53.7% accuracy. Our findings highlight the importance of addressing biases in dataset condensation and provide a promising avenue to address bias amplification in the process.'
volume: 235
URL: https://proceedings.mlr.press/v235/cui24e.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cui24e/cui24e.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cui24e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Justin
family: Cui
- given: Ruochen
family: Wang
- given: Yuanhao
family: Xiong
- given: Cho-Jui
family: Hsieh
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9696-9721
id: cui24e
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9696
lastpage: 9721
published: 2024-07-08 00:00:00 +0000
- title: 'ULTRAFEEDBACK: Boosting Language Models with Scaled AI Feedback'
abstract: 'Learning from human feedback has become a pivot technique in aligning large language models (LLMs) with human preferences. However, acquiring vast and premium human feedback is bottlenecked by time, labor, and human capability, resulting in small sizes or limited topics of current datasets. This further hinders feedback learning as well as alignment research within the open-source community. To address this issue, we explore how to go beyond human feedback and collect high-quality AI feedback automatically for a scalable alternative. Specifically, we identify scale and diversity as the key factors for feedback data to take effect. Accordingly, we first broaden instructions and responses in both amount and breadth to encompass a wider range of user-assistant interactions. Then, we meticulously apply a series of techniques to mitigate annotation biases for more reliable AI feedback. We finally present UltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset, which contains over 1 million GPT-4 feedback for 250k user-assistant conversations from various aspects. Built upon UltraFeedback, we align a LLaMA-based model by best-of-$n$ sampling and reinforcement learning, demonstrating its exceptional performance on chat benchmarks. Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models, serving as a solid foundation for future feedback learning research.'
volume: 235
URL: https://proceedings.mlr.press/v235/cui24f.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cui24f/cui24f.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cui24f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ganqu
family: Cui
- given: Lifan
family: Yuan
- given: Ning
family: Ding
- given: Guanming
family: Yao
- given: Bingxiang
family: He
- given: Wei
family: Zhu
- given: Yuan
family: Ni
- given: Guotong
family: Xie
- given: Ruobing
family: Xie
- given: Yankai
family: Lin
- given: Zhiyuan
family: Liu
- given: Maosong
family: Sun
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9722-9744
id: cui24f
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9722
lastpage: 9744
published: 2024-07-08 00:00:00 +0000
- title: 'Et Tu Certifications: Robustness Certificates Yield Better Adversarial Examples'
abstract: 'In guaranteeing the absence of adversarial examples in an instance’s neighbourhood, certification mechanisms play an important role in demonstrating neural net robustness. In this paper, we ask if these certifications can compromise the very models they help to protect? Our new *Certification Aware Attack* exploits certifications to produce computationally efficient norm-minimising adversarial examples $74$% more often than comparable attacks, while reducing the median perturbation norm by more than $10$%. While these attacks can be used to assess the tightness of certification bounds, they also highlight that releasing certifications can paradoxically reduce security.'
volume: 235
URL: https://proceedings.mlr.press/v235/cullen24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cullen24a/cullen24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cullen24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Andrew Craig
family: Cullen
- given: Shijie
family: Liu
- given: Paul
family: Montague
- given: Sarah Monazam
family: Erfani
- given: Benjamin I. P.
family: Rubinstein
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9745-9761
id: cullen24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9745
lastpage: 9761
published: 2024-07-08 00:00:00 +0000
- title: 'Differentially Private Decentralized Learning with Random Walks'
abstract: 'The popularity of federated learning comes from the possibility of better scalability and the ability for participants to keep control of their data, improving data security and sovereignty. Unfortunately, sharing model updates also creates a new privacy attack surface. In this work, we characterize the privacy guarantees of decentralized learning with random walk algorithms, where a model is updated by traveling from one node to another along the edges of a communication graph. Using a recent variant of differential privacy tailored to the study of decentralized algorithms, namely Pairwise Network Differential Privacy, we derive closed-form expressions for the privacy loss between each pair of nodes where the impact of the communication topology is captured by graph theoretic quantities. Our results further reveal that random walk algorithms tends to yield better privacy guarantees than gossip algorithms for nodes close from each other. We supplement our theoretical results with empirical evaluation on synthetic and real-world graphs and datasets.'
volume: 235
URL: https://proceedings.mlr.press/v235/cyffers24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/cyffers24a/cyffers24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-cyffers24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Edwige
family: Cyffers
- given: Aurélien
family: Bellet
- given: Jalaj
family: Upadhyay
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9762-9783
id: cyffers24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9762
lastpage: 9783
published: 2024-07-08 00:00:00 +0000
- title: 'Getting the most out of your tokenizer for pre-training and domain adaptation'
abstract: 'Tokenization is an understudied and often neglected component of modern LLMs. Most published works use a single tokenizer for all experiments, often borrowed from another model, without performing ablations or analysis to optimize tokenization. Moreover, the tokenizer is generally kept unchanged when fine-tuning a base model. In this paper, we show that the size, pre-tokenization regular expression, and training data of a tokenizer can significantly impact the model’s generation speed, effective context size, memory usage, and downstream performance. We train specialized Byte-Pair Encoding code tokenizers, and conduct extensive ablations on the impact of tokenizer design on the performance of LLMs for code generation tasks such as HumanEval and MBPP, and provide recommendations for tokenizer hyper-parameters selection and switching the tokenizer in a pre-trained LLM. We perform our experiments on models trained from scratch and from pre-trained models, verifying their applicability to a wide range of use-cases. We find that when fine-tuning on more than 50 billion tokens, we can specialize the tokenizer of a pre-trained LLM to obtain large gains in generation speed and effective context size.'
volume: 235
URL: https://proceedings.mlr.press/v235/dagan24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/dagan24a/dagan24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-dagan24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Gautier
family: Dagan
- given: Gabriel
family: Synnaeve
- given: Baptiste
family: Roziere
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9784-9805
id: dagan24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9784
lastpage: 9805
published: 2024-07-08 00:00:00 +0000
- title: 'Fault Tolerant ML: Efficient Meta-Aggregation and Synchronous Training'
abstract: 'In this paper, we investigate the challenging framework of Byzantine-robust training in distributed machine learning (ML) systems, focusing on enhancing both efficiency and practicality. As distributed ML systems become integral for complex ML tasks, ensuring resilience against Byzantine failures—where workers may contribute incorrect updates due to malice or error—gains paramount importance. Our first contribution is the introduction of the Centered Trimmed Meta Aggregator (CTMA), an efficient meta-aggregator that upgrades baseline aggregators to optimal performance levels, while requiring low computational demands. Additionally, we propose harnessing a recently developed gradient estimation technique based on a double-momentum strategy within the Byzantine context. Our paper highlights its theoretical and practical advantages for Byzantine-robust training, especially in simplifying the tuning process and reducing the reliance on numerous hyperparameters. The effectiveness of this technique is supported by theoretical insights within the stochastic convex optimization (SCO) framework and corroborated by empirical evidence.'
volume: 235
URL: https://proceedings.mlr.press/v235/dahan24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/dahan24a/dahan24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-dahan24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tehila
family: Dahan
- given: Kfir Yehuda
family: Levy
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9806-9833
id: dahan24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9806
lastpage: 9833
published: 2024-07-08 00:00:00 +0000
- title: 'Position: Beyond Personhood: Agency, Accountability, and the Limits of Anthropomorphic Ethical Analysis'
abstract: 'What is *agency,* and why does it matter? In this work, we draw from the political science and philosophy literature and give two competing visions of what it means to be an (ethical) agent. The first view, which we term *mechanistic*, is commonly— and implicitly—assumed in AI research, yet it is a fundamentally limited means to understand the ethical characteristics of AI. Under the second view, which we term volitional, AI can no longer be considered an ethical agent. We discuss the implications of each of these views for two critical questions: first, what the ideal system “ought” to look like, and second, how accountability may be achieved. In light of this discussion, we ultimately argue that, in the context of ethically-significant behavior, AI should be viewed not as an agent but as the outcome of political processes.'
volume: 235
URL: https://proceedings.mlr.press/v235/dai24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/dai24a/dai24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-dai24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jessica
family: Dai
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9834-9845
id: dai24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9834
lastpage: 9845
published: 2024-07-08 00:00:00 +0000
- title: 'Multi-View Clustering by Inter-cluster Connectivity Guided Reward'
abstract: 'Multi-view clustering has been widely explored for its effectiveness in harmonizing heterogeneity along with consistency in different views of data. Despite the significant progress made by recent works, the performance of most existing methods is heavily reliant on strong priori information regarding the true cluster number $\textit{K}$, which is rarely feasible in real-world scenarios. In this paper, we propose a novel graph-based multi-view clustering algorithm to infer unknown $\textit{K}$ through a graph consistency reward mechanism. To be specific, we evaluate the cluster indicator matrix during each iteration with respect to diverse $\textit{K}$. We formulate the inference process of unknown $\textit{K}$ as a parsimonious reinforcement learning paradigm, where the reward is measured by inter-cluster connectivity. As a result, our approach is capable of independently producing the final clustering result, free from the input of a predefined cluster number. Experimental results on multiple benchmark datasets demonstrate the effectiveness of our proposed approach in comparison to existing state-of-the-art methods.'
volume: 235
URL: https://proceedings.mlr.press/v235/dai24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/dai24b/dai24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-dai24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hao
family: Dai
- given: Yang
family: Liu
- given: Peng
family: Su
- given: Hecheng
family: Cai
- given: Shudong
family: Huang
- given: Jiancheng
family: Lv
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9846-9855
id: dai24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9846
lastpage: 9855
published: 2024-07-08 00:00:00 +0000
- title: 'High-Order Contrastive Learning with Fine-grained Comparative Levels for Sparse Ordinal Tensor Completion'
abstract: 'Contrastive learning is a powerful paradigm for representation learning with prominent success in computer vision and NLP, but how to extend its success to high-dimensional tensors remains a challenge. This is because tensor data often exhibit high-order mode-interactions that are hard to profile and with negative samples growing combinatorially faster than second-order contrastive learning; furthermore, many real-world tensors have ordinal entries that necessitate more delicate comparative levels. To solve the challenge, we propose High-Order Contrastive Tensor Completion (HOCTC), an innovative network to extend contrastive learning to sparse ordinal tensor data. HOCTC employs a novel attention-based strategy with query-expansion to capture high-order mode interactions even in case of very limited tokens, which transcends beyond second-order learning scenarios. Besides, it extends two-level comparisons (positive-vs-negative) to fine-grained contrast-levels using ordinal tensor entries as a natural guidance. Efficient sampling scheme is proposed to enforce such delicate comparative structures, generating comprehensive self-supervised signals for high-order representation learning. Extensive experiments show that HOCTC has promising results in sparse tensor completion in traffic/recommender applications.'
volume: 235
URL: https://proceedings.mlr.press/v235/dai24c.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/dai24c/dai24c.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-dai24c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yu
family: Dai
- given: Junchen
family: Shen
- given: Zijie
family: Zhai
- given: Danlin
family: Liu
- given: Jingyang
family: Chen
- given: Yu
family: Sun
- given: Ping
family: Li
- given: Jie
family: Zhang
- given: Kai
family: Zhang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9856-9871
id: dai24c
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9856
lastpage: 9871
published: 2024-07-08 00:00:00 +0000
- title: 'Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation'
abstract: 'A key aspect of Safe Reinforcement Learning (Safe RL) involves estimating the constraint condition for the next policy, which is crucial for guiding the optimization of safe policy updates. However, the existing *Advantage-based Estimation* (ABE) method relies on the infinite-horizon discounted advantage function. This dependence leads to catastrophic errors in finite-horizon scenarios with non-discounted constraints, resulting in safety-violation updates. In response, we propose the first estimation method for finite-horizon non-discounted constraints in deep Safe RL, termed *Gradient-based Estimation* (GBE), which relies on the analytic gradient derived along trajectories. Our theoretical and empirical analyses demonstrate that GBE can effectively estimate constraint changes over a finite horizon. Constructing a surrogate optimization problem with GBE, we developed a novel Safe RL algorithm called *Constrained Gradient-based Policy Optimization* (CGPO). CGPO identifies feasible optimal policies by iteratively resolving sub-problems within trust regions. Our empirical results reveal that CGPO, unlike baseline algorithms, successfully estimates the constraint functions of subsequent policies, thereby ensuring the efficiency and feasibility of each update.'
volume: 235
URL: https://proceedings.mlr.press/v235/dai24d.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/dai24d/dai24d.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-dai24d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Juntao
family: Dai
- given: Yaodong
family: Yang
- given: Qian
family: Zheng
- given: Gang
family: Pan
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9872-9903
id: dai24d
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9872
lastpage: 9903
published: 2024-07-08 00:00:00 +0000
- title: 'Averaging $n$-step Returns Reduces Variance in Reinforcement Learning'
abstract: 'Multistep returns, such as $n$-step returns and $\lambda$-returns, are commonly used to improve the sample efficiency of reinforcement learning (RL) methods. The variance of the multistep returns becomes the limiting factor in their length; looking too far into the future increases variance and reverses the benefits of multistep learning. In our work, we demonstrate the ability of compound returns—weighted averages of $n$-step returns—to reduce variance. We prove for the first time that any compound return with the same contraction modulus as a given $n$-step return has strictly lower variance. We additionally prove that this variance-reduction property improves the finite-sample complexity of temporal-difference learning under linear function approximation. Because general compound returns can be expensive to implement, we introduce two-bootstrap returns which reduce variance while remaining efficient, even when using minibatched experience replay. We conduct experiments showing that compound returns often increase the sample efficiency of $n$-step deep RL agents like DQN and PPO.'
volume: 235
URL: https://proceedings.mlr.press/v235/daley24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/daley24a/daley24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-daley24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Brett
family: Daley
- given: Martha
family: White
- given: Marlos
family: C. Machado
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9904-9930
id: daley24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9904
lastpage: 9930
published: 2024-07-08 00:00:00 +0000
- title: 'Pruned Pivot: Correlation Clustering Algorithm for Dynamic, Parallel, and Local Computation Models'
abstract: 'Given a graph with positive and negative edge labels, the correlation clustering problem aims to cluster the nodes so to minimize the total number of between-cluster positive and within-cluster negative edges. This problem has many applications in data mining, particularly in unsupervised learning. Inspired by the prevalence of large graphs and constantly changing data in modern applications, we study correlation clustering in dynamic, parallel (MPC), and local computation (LCA) settings. We design an approach that improves state-of-the-art runtime complexities in all these settings. In particular, we provide the first fully dynamic algorithm that runs in an expected amortized constant time, without any dependence on the graph size. Moreover, our algorithm essentially matches the approximation guarantee of the celebrated Pivot algorithm.'
volume: 235
URL: https://proceedings.mlr.press/v235/dalirrooyfard24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/dalirrooyfard24a/dalirrooyfard24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-dalirrooyfard24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mina
family: Dalirrooyfard
- given: Konstantin
family: Makarychev
- given: Slobodan
family: Mitrovic
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9931-9952
id: dalirrooyfard24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9931
lastpage: 9952
published: 2024-07-08 00:00:00 +0000
- title: 'Physics and Lie symmetry informed Gaussian processes'
abstract: 'Physics-informed machine learning (PIML) has established itself as a new scientific paradigm which enables the seamless integration of observational data with partial differential equation (PDE) based physics models. A powerful tool for the analysis, reduction and solution of PDEs is the Lie symmetry method. Nevertheless, only recently has the integration of such symmetries into PIML frameworks begun to be explored. The present work adds to this growing literature by introducing an approach for incorporating a Lie symmetry into a physics-informed Gaussian process (GP) model. The symmetry is introduced as a constraint on the GP; either in a soft manner via virtual observations of an induced PDE called the invariant surface condition, or explicitly through the design of the kernel. Experimental results demonstrate that the use of symmetry constraints improves the performance of the GP for both forward and inverse problems, and that our approach offers competitive performance with neural networks in the low-data environment.'
volume: 235
URL: https://proceedings.mlr.press/v235/dalton24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/dalton24a/dalton24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-dalton24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: David
family: Dalton
- given: Dirk
family: Husmeier
- given: Hao
family: Gao
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9953-9975
id: dalton24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9953
lastpage: 9975
published: 2024-07-08 00:00:00 +0000
- title: 'Exploring the Enigma of Neural Dynamics Through A Scattering-Transform Mixer Landscape for Riemannian Manifold'
abstract: 'The human brain is a complex inter-wired system that emerges spontaneous functional fluctuations. In spite of tremendous success in the experimental neuroscience field, a system-level understanding of how brain anatomy supports various neural activities remains elusive. Capitalizing on the unprecedented amount of neuroimaging data, we present a physics-informed deep model to uncover the coupling mechanism between brain structure and function through the lens of data geometry that is rooted in the widespread wiring topology of connections between distant brain regions. Since deciphering the puzzle of self-organized patterns in functional fluctuations is the gateway to understanding the emergence of cognition and behavior, we devise a geometric deep model to uncover manifold mapping functions that characterize the intrinsic feature representations of evolving functional fluctuations on the Riemannian manifold. In lieu of learning unconstrained mapping functions, we introduce a set of graph-harmonic scattering transforms to impose the brain-wide geometry on top of manifold mapping functions, which allows us to cast the manifold-based deep learning into a reminiscent of *MLP-Mixer* architecture (in computer vision) for Riemannian manifold. As a proof-of-concept approach, we explore a neural-manifold perspective to understand the relationship between (static) brain structure and (dynamic) function, challenging the prevailing notion in cognitive neuroscience by proposing that neural activities are essentially excited by brain-wide oscillation waves living on the geometry of human connectomes, instead of being confined to focal areas.'
volume: 235
URL: https://proceedings.mlr.press/v235/dan24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/dan24a/dan24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-dan24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tingting
family: Dan
- given: Ziquan
family: Wei
- given: Won Hwa
family: Kim
- given: Guorong
family: Wu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9976-9990
id: dan24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9976
lastpage: 9990
published: 2024-07-08 00:00:00 +0000
- title: 'The Benefits of Reusing Batches for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents'
abstract: 'We investigate the training dynamics of two-layer neural networks when learning multi-index target functions. We focus on multi-pass gradient descent (GD) that reuses the batches multiple times and show that it significantly changes the conclusion about which functions are learnable compared to single-pass gradient descent. In particular, multi-pass GD with finite stepsize is found to overcome the limitations of gradient flow and single-pass GD given by the information exponent (Ben Arous et al., 2021) and leap exponent (Abbe et al., 2023) of the target function. We show that upon re-using batches, the network achieves in just two time steps an overlap with the target subspace even for functions not satisfying the staircase property (Abbe et al., 2021). We characterize the (broad) class of functions efficiently learned in finite time. The proof of our results is based on the analysis of the Dynamical Mean-Field Theory (DMFT). We further provide a closed-form description of the dynamical process of the low-dimensional projections of the weights, and numerical experiments illustrating the theory.'
volume: 235
URL: https://proceedings.mlr.press/v235/dandi24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/dandi24a/dandi24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-dandi24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yatin
family: Dandi
- given: Emanuele
family: Troiani
- given: Luca
family: Arnaboldi
- given: Luca
family: Pesce
- given: Lenka
family: Zdeborova
- given: Florent
family: Krzakala
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 9991-10016
id: dandi24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 9991
lastpage: 10016
published: 2024-07-08 00:00:00 +0000
- title: 'Neural Collapse for Cross-entropy Class-Imbalanced Learning with Unconstrained ReLU Features Model'
abstract: 'The current paradigm of training deep neural networks for classification tasks includes minimizing the empirical risk, pushing the training loss value towards zero even after the training classification error has vanished. In this terminal phase of training, it has been observed that the last-layer features collapse to their class-means and these class-means converge to the vertices of a simplex Equiangular Tight Frame (ETF). This phenomenon is termed as Neural Collapse ($\mathcal{NC}$). However, this characterization only holds in class-balanced datasets where every class has the same number of training samples. When the training dataset is class-imbalanced, some $\mathcal{NC}$ properties will no longer hold true, for example, the geometry of class-means will skew away from the simplex ETF. In this paper, we generalize $\mathcal{NC}$ to imbalanced regime for cross-entropy loss under the unconstrained ReLU features model. We demonstrate that while the within-class features collapse property still holds in this setting, the class-means will converge to a structure consisting of orthogonal vectors with lengths dependent on the number of training samples. Furthermore, we find that the classifier weights (i.e., the last-layer linear classifier) are aligned to the scaled and centered class-means, with scaling factors dependent on the number of training samples of each class. This generalizes $\mathcal{NC}$ in the class-balanced setting. We empirically validate our results through experiments on practical architectures and dataset.'
volume: 235
URL: https://proceedings.mlr.press/v235/dang24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/dang24a/dang24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-dang24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hien
family: Dang
- given: Tho Tran
family: Huu
- given: Tan Minh
family: Nguyen
- given: Nhat
family: Ho
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10017-10040
id: dang24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10017
lastpage: 10040
published: 2024-07-08 00:00:00 +0000
- title: 'Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality'
abstract: 'While Transformers have been the main architecture behind deep learning’s success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale. We show that these families of models are actually quite closely related, and develop a rich framework of theoretical connections between SSMs and variants of attention, connected through various decompositions of a well-studied class of structured *semiseparable matrices*. Our state space duality (SSD) framework allows us to design a new architecture (**Mamba-2**) whose core layer is an a refinement of Mamba’s selective SSM that is 2-8$\times$ faster, while continuing to be competitive with Transformers on language modeling.'
volume: 235
URL: https://proceedings.mlr.press/v235/dao24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/dao24a/dao24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-dao24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tri
family: Dao
- given: Albert
family: Gu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10041-10071
id: dao24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10041
lastpage: 10071
published: 2024-07-08 00:00:00 +0000
- title: 'Boosting Offline Optimizers with Surrogate Sensitivity'
abstract: 'Offline optimization is an important task in numerous material engineering domains where online experimentation to collect data is too expensive and needs to be replaced by an in silico maximization of a surrogate of the black-box function. Although such a surrogate can be learned from offline data, its prediction might not be reliable outside the offline data regime, which happens when the surrogate has narrow prediction margin and is (therefore) sensitive to small perturbations of its parameterization. This raises the following questions: (1) how to regulate the sensitivity of a surrogate model; and (2) whether conditioning an offline optimizer with such less sensitive surrogate will lead to better optimization performance. To address these questions, we develop an optimizable sensitivity measurement for the surrogate model, which then inspires a sensitivity-informed regularizer that is applicable to a wide range of offline optimizers. This development is both orthogonal and synergistic to prior research on offline optimization, which is demonstrated in our extensive experiment benchmark.'
volume: 235
URL: https://proceedings.mlr.press/v235/dao24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/dao24b/dao24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-dao24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Manh Cuong
family: Dao
- given: Phi Le
family: Nguyen
- given: Thao Nguyen
family: Truong
- given: Trong Nghia
family: Hoang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10072-10090
id: dao24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10072
lastpage: 10090
published: 2024-07-08 00:00:00 +0000
- title: 'Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy Data'
abstract: 'Ambient diffusion is a recently proposed framework for training diffusion models using corrupted data. Both Ambient Diffusion and alternative SURE-based approaches for learning diffusion models from corrupted data resort to approximations which deteriorate performance. We present the first framework for training diffusion models that provably sample from the uncorrupted distribution given only noisy training data, solving an open problem in Ambient diffusion. Our key technical contribution is a method that uses a double application of Tweedie’s formula and a consistency loss function that allows us to extend sampling at noise levels below the observed data noise. We also provide further evidence that diffusion models memorize from their training sets by identifying extremely corrupted images that are almost perfectly reconstructed, raising copyright and privacy concerns. Our method for training using corrupted samples can be used to mitigate this problem. We demonstrate this by fine-tuning Stable Diffusion XL to generate samples from a distribution using only noisy samples. Our framework reduces the amount of memorization of the fine-tuning dataset, while maintaining competitive performance.'
volume: 235
URL: https://proceedings.mlr.press/v235/daras24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/daras24a/daras24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-daras24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Giannis
family: Daras
- given: Alex
family: Dimakis
- given: Constantinos Costis
family: Daskalakis
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10091-10108
id: daras24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10091
lastpage: 10108
published: 2024-07-08 00:00:00 +0000
- title: 'Larimar: Large Language Models with Episodic Memory Control'
abstract: 'Efficient and accurate updating of knowledge stored in Large Language Models (LLMs) is one of the most pressing research challenges today. This paper presents Larimar - a novel, brain-inspired architecture for enhancing LLMs with a distributed episodic memory. Larimar’s memory allows for dynamic, one-shot updates of knowledge without the need for computationally expensive re-training or fine-tuning. Experimental results on multiple fact editing benchmarks demonstrate that Larimar attains accuracy comparable to most competitive baselines, even in the challenging sequential editing setup, but also excels in speed—yielding speed-ups of 8-10x depending on the base LLM —as well as flexibility due to the proposed architecture being simple, LLM-agnostic, and hence general. We further provide mechanisms for selective fact forgetting, information leakage prevention, and input context length generalization with Larimar and show their effectiveness. Our code is available at https://github.com/IBM/larimar.'
volume: 235
URL: https://proceedings.mlr.press/v235/das24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/das24a/das24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-das24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Payel
family: Das
- given: Subhajit
family: Chaudhury
- given: Elliot
family: Nelson
- given: Igor
family: Melnyk
- given: Sarathkrishna
family: Swaminathan
- given: Sihui
family: Dai
- given: Aurelie
family: Lozano
- given: Georgios
family: Kollias
- given: Vijil
family: Chenthamarakshan
- given: Jiri
family: Navratil
- given: Soham
family: Dan
- given: Pin-Yu
family: Chen
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10109-10126
id: das24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10109
lastpage: 10126
published: 2024-07-08 00:00:00 +0000
- title: 'Understanding the Training Speedup from Sampling with Approximate Losses'
abstract: 'It is well known that selecting samples with large losses/gradients can significantly reduce the number of training steps. However, the selection overhead is often too high to yield any meaningful gains in terms of overall training time. In this work, we focus on the greedy approach of selecting samples with large *approximate losses* instead of exact losses in order to reduce the selection overhead. For smooth convex losses, we show that such a greedy strategy can converge to a constant factor of the minimum value of the average loss in fewer iterations than the standard approach of random selection. We also theoretically quantify the effect of the approximation level. We then develop SIFT which uses early exiting to obtain approximate losses with an intermediate layer’s representations for sample selection. We evaluate SIFT on the task of training a 110M parameter 12 layer BERT base model, and show significant gains (in terms of training hours and number of backpropagation steps) without any optimized implementation over vanilla training. For e.g., to reach 64% validation accuracy, SIFT with exit at the first layer takes $\sim$ 43 hours compared to $\sim$ 57 hours of vanilla training.'
volume: 235
URL: https://proceedings.mlr.press/v235/das24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/das24b/das24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-das24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Rudrajit
family: Das
- given: Xi
family: Chen
- given: Bertram
family: Ieong
- given: Parikshit
family: Bansal
- given: Sujay
family: Sanghavi
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10127-10147
id: das24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10127
lastpage: 10147
published: 2024-07-08 00:00:00 +0000
- title: 'A decoder-only foundation model for time-series forecasting'
abstract: 'Motivated by recent advances in large language models for Natural Language Processing (NLP), we design a time-series foundation model for forecasting whose out-of-the-box zero-shot performance on a variety of public datasets comes close to the accuracy of state-of-the-art supervised forecasting models for each individual dataset. Our model is based on pretraining a decoder style attention model with input patching, using a large time-series corpus comprising both real-world and synthetic datasets. Experiments on a diverse set of previously unseen forecasting datasets suggests that the model can yield accurate zero-shot forecasts across different domains, forecasting horizons and temporal granularities.'
volume: 235
URL: https://proceedings.mlr.press/v235/das24c.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/das24c/das24c.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-das24c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Abhimanyu
family: Das
- given: Weihao
family: Kong
- given: Rajat
family: Sen
- given: Yichen
family: Zhou
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10148-10167
id: das24c
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10148
lastpage: 10167
published: 2024-07-08 00:00:00 +0000
- title: 'Disparate Impact on Group Accuracy of Linearization for Private Inference'
abstract: 'Ensuring privacy-preserving inference on cryptographically secure data is a well-known computational challenge. To alleviate the bottleneck of costly cryptographic computations in non-linear activations, recent methods have suggested linearizing a targeted portion of these activations in neural networks. This technique results in significantly reduced runtimes with often negligible impacts on accuracy. In this paper, we demonstrate that such computational benefits may lead to increased fairness costs. Specifically, we find that reducing the number of ReLU activations disproportionately decreases the accuracy for minority groups compared to majority groups. To explain these observations, we provide a mathematical interpretation under restricted assumptions about the nature of the decision boundary, while also showing the prevalence of this problem across widely used datasets and architectures. Finally, we show how a simple procedure altering the finetuning step for linearized models can serve as an effective mitigation strategy.'
volume: 235
URL: https://proceedings.mlr.press/v235/das24d.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/das24d/das24d.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-das24d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Saswat
family: Das
- given: Marco
family: Romanelli
- given: Ferdinando
family: Fioretto
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10168-10184
id: das24d
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10168
lastpage: 10184
published: 2024-07-08 00:00:00 +0000
- title: 'New Bounds on the Cohesion of Complete-link and Other Linkage Methods for Agglomerative Clustering'
abstract: 'Linkage methods are among the most popular algorithms for hierarchical clustering. Despite their relevance, the current knowledge regarding the quality of the clustering produced by these methods is limited. Here, we improve the currently available bounds on the maximum diameter of the clustering obtained by complete-link for metric spaces. One of our new bounds, in contrast to the existing ones, allows us to separate complete-link from single-link in terms of approximation for the diameter, which corroborates the common perception that the former is more suitable than the latter when the goal is producing compact clusters. We also show that our techniques can be employed to derive upper bounds on the cohesion of a class of linkage methods that includes the quite popular average-link.'
volume: 235
URL: https://proceedings.mlr.press/v235/dasgupta24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/dasgupta24a/dasgupta24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-dasgupta24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sanjoy
family: Dasgupta
- given: Eduardo Sany
family: Laber
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10185-10205
id: dasgupta24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10185
lastpage: 10205
published: 2024-07-08 00:00:00 +0000
- title: 'Geometric Active Exploration in Markov Decision Processes: the Benefit of Abstraction'
abstract: 'How can a scientist use a Reinforcement Learning (RL) algorithm to design experiments over a dynamical system’s state space? In the case of finite and Markovian systems, an area called *Active Exploration* (AE) relaxes the optimization problem of experiments design into Convex RL, a generalization of RL admitting a wider notion of reward. Unfortunately, this framework is currently not scalable and the potential of AE is hindered by the vastness of experiments spaces typical of scientific discovery applications. However, these spaces are often endowed with natural geometries, e.g., permutation invariance in molecular design, that an agent could leverage to improve the statistical and computational efficiency of AE. To achieve this, we bridge AE and MDP homomorphisms, which offer a way to exploit known geometric structures via abstraction. Towards this goal, we make two fundamental contributions: we extend MDP homomorphisms formalism to Convex RL, and we present, to the best of our knowledge, the first analysis that formally captures the benefit of abstraction via homomorphisms on sample efficiency. Ultimately, we propose the Geometric Active Exploration (GAE) algorithm, which we analyse theoretically and experimentally in environments motivated by problems in scientific discovery.'
volume: 235
URL: https://proceedings.mlr.press/v235/de-santi24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/de-santi24a/de-santi24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-de-santi24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Riccardo
family: De Santi
- given: Federico Arangath
family: Joseph
- given: Noah
family: Liniger
- given: Mirco
family: Mutti
- given: Andreas
family: Krause
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10206-10234
id: de-santi24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10206
lastpage: 10234
published: 2024-07-08 00:00:00 +0000
- title: 'Global Reinforcement Learning : Beyond Linear and Convex Rewards via Submodular Semi-gradient Methods'
abstract: 'In classic Reinforcement Learning (RL), the agent maximizes an additive objective of the visited states, e.g., a value function. Unfortunately, objectives of this type cannot model many real-world applications such as experiment design, exploration, imitation learning, and risk-averse RL to name a few. This is due to the fact that additive objectives disregard interactions between states that are crucial for certain tasks. To tackle this problem, we introduce *Global* RL (GRL), where rewards are *globally* defined over trajectories instead of *locally* over states. Global rewards can capture *negative interactions* among states, e.g., in exploration, via submodularity, *positive interactions*, e.g., synergetic effects, via supermodularity, while mixed interactions via combinations of them. By exploiting ideas from submodular optimization, we propose a novel algorithmic scheme that converts any GRL problem to a sequence of classic RL problems and solves it efficiently with curvature-dependent approximation guarantees. We also provide hardness of approximation results and empirically demonstrate the effectiveness of our method on several GRL instances.'
volume: 235
URL: https://proceedings.mlr.press/v235/de-santi24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/de-santi24b/de-santi24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-de-santi24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Riccardo
family: De Santi
- given: Manish
family: Prajapat
- given: Andreas
family: Krause
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10235-10266
id: de-santi24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10235
lastpage: 10266
published: 2024-07-08 00:00:00 +0000
- title: 'Provably Better Explanations with Optimized Aggregation of Feature Attributions'
abstract: 'Using feature attributions for post-hoc explanations is a common practice to understand and verify the predictions of opaque machine learning models. Despite the numerous techniques available, individual methods often produce inconsistent and unstable results, putting their overall reliability into question. In this work, we aim to systematically improve the quality of feature attributions by combining multiple explanations across distinct methods or their variations. For this purpose, we propose a novel approach to derive optimal convex combinations of feature attributions that yield provable improvements of desired quality criteria such as robustness or faithfulness to the model behavior. Through extensive experiments involving various model architectures and popular feature attribution techniques, we demonstrate that our combination strategy consistently outperforms individual methods and existing baselines.'
volume: 235
URL: https://proceedings.mlr.press/v235/decker24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/decker24a/decker24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-decker24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Thomas
family: Decker
- given: Ananta R.
family: Bhattarai
- given: Jindong
family: Gu
- given: Volker
family: Tresp
- given: Florian
family: Buettner
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10267-10286
id: decker24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10267
lastpage: 10286
published: 2024-07-08 00:00:00 +0000
- title: 'Learning Cognitive Maps from Transformer Representations for Efficient Planning in Partially Observed Environments'
abstract: 'Despite their stellar performance on a wide range of tasks, including in-context tasks only revealed during inference, vanilla transformers and variants trained for next-token predictions (a) do not learn an explicit world model of their environment which can be flexibly queried and (b) cannot be used for planning or navigation. In this paper, we consider partially observed environments (POEs), where an agent receives perceptually aliased observations as it navigates, which makes path planning hard. We introduce a transformer with (multiple) discrete bottleneck(s), TDB, whose latent codes learn a compressed representation of the history of observations and actions. After training a TDB to predict the future observation(s) given the history, we extract interpretable cognitive maps of the environment from its active bottleneck(s) indices. These maps are then paired with an external solver to solve (constrained) path planning problems. First, we show that a TDB trained on POEs (a) retains the near-perfect predictive performance of a vanilla transformer or an LSTM while (b) solving shortest path problems exponentially faster. Second, a TDB extracts interpretable representations from text datasets, while reaching higher in-context accuracy than vanilla sequence models. Finally, in new POEs, a TDB (a) reaches near-perfect in-context accuracy, (b) learns accurate in-context cognitive maps (c) solves in-context path planning problems.'
volume: 235
URL: https://proceedings.mlr.press/v235/dedieu24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/dedieu24a/dedieu24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-dedieu24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Antoine
family: Dedieu
- given: Wolfgang
family: Lehrach
- given: Guangyao
family: Zhou
- given: Dileep
family: George
- given: Miguel
family: Lazaro-Gredilla
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10287-10316
id: dedieu24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10287
lastpage: 10316
published: 2024-07-08 00:00:00 +0000
- title: 'Asymptotically Optimal and Computationally Efficient Average Treatment Effect Estimation in A/B testing'
abstract: 'Motivated by practical applications in clinical trials and online platforms, we study A/B testing with the aim of estimating a confidence interval (CI) for the average treatment effect (ATE) using the minimum expected sample size. This CI should have a width at most $\epsilon$ while ensuring that the probability of the CI not containing the true ATE is at most $\delta$. To answer this, we first establish a lower bound on the expected sample size needed for any adaptive policy which constructs a CI of ATE with desired properties. Specifically, we prove that the lower bound is based on the solution to a max-min non-convex optimization problem for small $\delta$. Tailoring the “plug-in” approach for the ATE problem, we construct an adaptive policy that is asymptotically optimal, i.e., matches the lower bound on the expected sample size for small $\delta$. Interestingly, we find that, for small $\epsilon$ and $\delta$, the asymptotically optimal fraction of treatment assignment for A and B is proportional to the standard deviation of the outcome distributions of treatments A and B, respectively. However, as the proposed approach can be computationally intensive, we propose an alternative adaptive policy. This new policy, informed by insights from our lower bound analysis, is computationally efficient while remaining asymptotically optimal for small values of $\epsilon$ and $\delta$. Numerical comparisons demonstrate that both policies perform similarly across practical values of $\epsilon$ and $\delta$, offering efficient solutions for A/B testing.'
volume: 235
URL: https://proceedings.mlr.press/v235/deep24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/deep24a/deep24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-deep24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Vikas
family: Deep
- given: Achal
family: Bassamboo
- given: Sandeep Kumar
family: Juneja
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10317-10367
id: deep24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10317
lastpage: 10367
published: 2024-07-08 00:00:00 +0000
- title: 'Predicting Lagrangian Multipliers for Mixed Integer Linear Programs'
abstract: 'Lagrangian Relaxation stands among the most efficient approaches for solving Mixed Integer Linear Programs (MILPs) with difficult constraints. Given any duals for these constraints, called Lagrangian Multipliers (LMs), it returns a bound on the optimal value of the MILP, and Lagrangian methods seek the LMs giving the best such bound. But these methods generally rely on iterative algorithms resembling gradient descent to maximize the concave piecewise linear dual function: the computational burden grows quickly with the number of relaxed constraints. We introduce a deep learning approach that bypasses the descent, effectively amortizing per instance optimization. A probabilistic encoder based on a graph neural network computes, given a MILP instance and its Continuous Relaxation (CR) solution, high-dimensional representations of relaxed constraints, which are turned into LMs by a decoder. We train the encoder and the decoder jointly by directly optimizing the bound obtained from the predicted multipliers. Our method is applicable to any problem with a compact MILP formulation, and to any Lagrangian Relaxation providing a tighter bound than CR. Experiments on two widely known problems, Multi-Commodity Network Design and Generalized Assignment, show that our approach closes up to 85% of the gap between the continuous relaxation and the best Lagrangian bound, and provides a high-quality warm-start for descent-based Lagrangian methods.'
volume: 235
URL: https://proceedings.mlr.press/v235/demelas24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/demelas24a/demelas24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-demelas24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Francesco
family: Demelas
- given: Joseph Le
family: Roux
- given: Mathieu
family: Lacroix
- given: Axel
family: Parmentier
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10368-10384
id: demelas24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10368
lastpage: 10384
published: 2024-07-08 00:00:00 +0000
- title: 'Prediction-powered Generalization of Causal Inferences'
abstract: 'Causal inferences from a randomized controlled trial (RCT) may not pertain to a *target* population where some effect modifiers have a different distribution. Prior work studies *generalizing* the results of a trial to a target population with no outcome but covariate data available. We show how the limited size of trials makes generalization a statistically infeasible task, as it requires estimating complex nuisance functions. We develop generalization algorithms that supplement the trial data with a prediction model learned from an additional *observational* study (OS), without making *any* assumptions on the OS. We theoretically and empirically show that our methods facilitate better generalization when the OS is "high-quality", and remain robust when it is not, and *e.g.*, have unmeasured confounding.'
volume: 235
URL: https://proceedings.mlr.press/v235/demirel24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/demirel24a/demirel24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-demirel24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ilker
family: Demirel
- given: Ahmed
family: Alaa
- given: Anthony
family: Philippakis
- given: David
family: Sontag
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10385-10408
id: demirel24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10385
lastpage: 10408
published: 2024-07-08 00:00:00 +0000
- title: 'An Unsupervised Approach for Periodic Source Detection in Time Series'
abstract: 'Detection of periodic patterns of interest within noisy time series data plays a critical role in various tasks, spanning from health monitoring to behavior analysis. Existing learning techniques often rely on labels or clean versions of signals for detecting the periodicity, and those employing self-supervised methods are required to apply proper augmentations, which is already challenging for time series and can result in collapse—all representations collapse to a single point due to strong augmentation. In this work, we propose a novel method to detect the periodicity in time series without the need for any labels or requiring tailored positive or negative data generation mechanisms. We mitigate the collapse issue by ensuring the learned representations retain information from the original samples without imposing any variance constraints on the batch. Our experiments in three time-series tasks against state-of-the-art learning methods show that the proposed approach consistently outperforms prior works, achieving performance improvements of more than 45–50%, showing its effectiveness.'
volume: 235
URL: https://proceedings.mlr.press/v235/demirel24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/demirel24b/demirel24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-demirel24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Berken Utku
family: Demirel
- given: Christian
family: Holz
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10409-10439
id: demirel24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10409
lastpage: 10439
published: 2024-07-08 00:00:00 +0000
- title: 'Multi-group Learning for Hierarchical Groups'
abstract: 'The multi-group learning model formalizes the learning scenario in which a single predictor must generalize well on multiple, possibly overlapping subgroups of interest. We extend the study of multi-group learning to the natural case where the groups are hierarchically structured. We design an algorithm for this setting that outputs an interpretable and deterministic decision tree predictor with near-optimal sample complexity. We then conduct an empirical evaluation of our algorithm and find that it achieves attractive generalization properties on real datasets with hierarchical group structure.'
volume: 235
URL: https://proceedings.mlr.press/v235/deng24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/deng24a/deng24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-deng24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Samuel
family: Deng
- given: Daniel
family: Hsu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10440-10487
id: deng24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10440
lastpage: 10487
published: 2024-07-08 00:00:00 +0000
- title: 'A3S: A General Active Clustering Method with Pairwise Constraints'
abstract: 'Active clustering aims to boost the clustering performance by integrating human-annotated pairwise constraints through strategic querying. Conventional approaches with semi-supervised clustering schemes encounter high query costs when applied to large datasets with numerous classes. To address these limitations, we propose a novel Adaptive Active Aggregation and Splitting (A3S) framework, falling within the cluster-adjustment scheme in active clustering. A3S features strategic active clustering adjustment on the initial cluster result, which is obtained by an adaptive clustering algorithm. In particular, our cluster adjustment is inspired by the quantitative analysis of Normalized mutual information gain under the information theory framework and can provably improve the clustering quality. The proposed A3S framework significantly elevates the performance and scalability of active clustering. In extensive experiments across diverse real-world datasets, A3S achieves desired results with significantly fewer human queries compared with existing methods.'
volume: 235
URL: https://proceedings.mlr.press/v235/deng24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/deng24b/deng24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-deng24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xun
family: Deng
- given: Junlong
family: Liu
- given: Han
family: Zhong
- given: Fuli
family: Feng
- given: Chen
family: Shen
- given: Xiangnan
family: He
- given: Jieping
family: Ye
- given: Zheng
family: Wang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10488-10505
id: deng24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10488
lastpage: 10505
published: 2024-07-08 00:00:00 +0000
- title: 'Variational Schrödinger Diffusion Models'
abstract: 'Schrödinger bridge (SB) has emerged as the go-to method for optimizing transportation plans in diffusion models. However, SB requires estimating the intractable forward score functions, inevitably resulting in the (costly) implicit training loss based on simulated trajectories. To improve the scalability while preserving efficient transportation plans, we leverage variational inference to linearize the forward score functions (variational scores) of SB and restore *simulation-free* properties in training backward scores. We propose the variational Schrödinger diffusion model (VSDM), where the forward process is a multivariate diffusion and the variational scores are adaptively optimized for efficient transport. Theoretically, we use stochastic approximation to prove the convergence of the variational scores and show the convergence of the adaptively generated samples based on the optimal variational scores. Empirically, we test the algorithm in simulated examples and observe that VSDM is efficient in generations of anisotropic shapes and yields straighter sample trajectories compared to the single-variate diffusion. We also verify the scalability of the algorithm in real-world data and achieve competitive unconditional generation performance in CIFAR10 and conditional generation in time series modeling. Notably, VSDM no longer depends on warm-up initializations required by SB.'
volume: 235
URL: https://proceedings.mlr.press/v235/deng24c.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/deng24c/deng24c.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-deng24c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Wei
family: Deng
- given: Weijian
family: Luo
- given: Yixin
family: Tan
- given: Marin
family: Biloš
- given: Yu
family: Chen
- given: Yuriy
family: Nevmyvaka
- given: Ricky T. Q.
family: Chen
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10506-10529
id: deng24c
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10506
lastpage: 10529
published: 2024-07-08 00:00:00 +0000
- title: 'Collaborative Learning with Different Labeling Functions'
abstract: 'We study a variant of Collaborative PAC Learning, in which we aim to learn an accurate classifier for each of the $n$ data distributions, while minimizing the number of samples drawn from them in total. Unlike in the usual collaborative learning setup, it is not assumed that there exists a single classifier that is simultaneously accurate for all distributions. We show that, when the data distributions satisfy a weaker realizability assumption, which appeared in (Crammer & Mansour, 2012) in the context of multi-task learning, sample-efficient learning is still feasible. We give a learning algorithm based on Empirical Risk Minimization (ERM) on a natural augmentation of the hypothesis class, and the analysis relies on an upper bound on the VC dimension of this augmented class. In terms of the computational efficiency, we show that ERM on the augmented hypothesis class is $\mathsf{NP}$-hard, which gives evidence against the existence of computationally efficient learners in general. On the positive side, for two special cases, we give learners that are both sample- and computationally-efficient.'
volume: 235
URL: https://proceedings.mlr.press/v235/deng24d.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/deng24d/deng24d.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-deng24d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yuyang
family: Deng
- given: Mingda
family: Qiao
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10530-10552
id: deng24d
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10530
lastpage: 10552
published: 2024-07-08 00:00:00 +0000
- title: 'Exploring the Low-Pass Filtering Behavior in Image Super-Resolution'
abstract: 'Deep neural networks for image super-resolution (ISR) have shown significant advantages over traditional approaches like the interpolation. However, they are often criticized as ’black boxes’ compared to traditional approaches with solid mathematical foundations. In this paper, we attempt to interpret the behavior of deep neural networks in ISR using theories from the field of signal processing. First, we report an intriguing phenomenon, referred to as ‘the sinc phenomenon.’ It occurs when an impulse input is fed to a neural network. Then, building on this observation, we propose a method named Hybrid Response Analysis (HyRA) to analyze the behavior of neural networks in ISR tasks. Specifically, HyRA decomposes a neural network into a parallel connection of a linear system and a non-linear system and demonstrates that the linear system functions as a low-pass filter while the non-linear system injects high-frequency information. Finally, to quantify the injected high-frequency information, we introduce a metric for image-to-image tasks called Frequency Spectrum Distribution Similarity (FSDS). FSDS reflects the distribution similarity of different frequency components and can capture nuances that traditional metrics may overlook. Code, videos and raw experimental results for this paper can be found in: https://github.com/RisingEntropy/LPFInISR.'
volume: 235
URL: https://proceedings.mlr.press/v235/deng24e.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/deng24e/deng24e.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-deng24e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Haoyu
family: Deng
- given: Zijing
family: Xu
- given: Yule
family: Duan
- given: Xiao
family: Wu
- given: Wenjie
family: Shu
- given: Liang-Jian
family: Deng
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10553-10573
id: deng24e
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10553
lastpage: 10573
published: 2024-07-08 00:00:00 +0000
- title: 'Network Tight Community Detection'
abstract: 'Conventional community detection methods often categorize all nodes into clusters. However, the presumed community structure of interest may only be valid for a subset of nodes (named as ‘tight nodes’), while the rest of the network may consist of noninformative “scattered nodes”. For example, a protein-protein network often contains proteins that do not belong to specific biological functional modules but are involved in more general processes, or act as bridges between different functional modules. Forcing each of these proteins into a single cluster introduces unwanted biases and obscures the underlying biological implication. To address this issue, we propose a tight community detection (TCD) method to identify tight communities excluding scattered nodes. The algorithm enjoys a strong theoretical guarantee of tight node identification accuracy and is scalable for large networks. The superiority of the proposed method is demonstrated by various synthetic and real experiments.'
volume: 235
URL: https://proceedings.mlr.press/v235/deng24f.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/deng24f/deng24f.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-deng24f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jiayi
family: Deng
- given: Xiaodong
family: Yang
- given: Jun
family: Yu
- given: Jun
family: Liu
- given: Zhaiming
family: Shen
- given: Danyang
family: Huang
- given: Huimin
family: Cheng
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10574-10596
id: deng24f
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10574
lastpage: 10596
published: 2024-07-08 00:00:00 +0000
- title: 'Going beyond Compositions, DDPMs Can Produce Zero-Shot Interpolations'
abstract: 'Denoising Diffusion Probabilistic Models (DDPMs) exhibit remarkable capabilities in image generation, with studies suggesting that they can generalize by composing latent factors learned from the training data. In this work, we go further and study DDPMs trained on strictly separate subsets of the data distribution with large gaps on the support of the latent factors. We show that such a model can effectively generate images in the unexplored, intermediate regions of the distribution. For instance, when trained on clearly smiling and non-smiling faces, we demonstrate a sampling procedure which can generate slightly smiling faces without reference images (zero-shot interpolation). We replicate these findings for other attributes as well as other datasets. Our code is available on GitHub.'
volume: 235
URL: https://proceedings.mlr.press/v235/deschenaux24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/deschenaux24a/deschenaux24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-deschenaux24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Justin
family: Deschenaux
- given: Igor
family: Krawczuk
- given: Grigorios
family: Chrysos
- given: Volkan
family: Cevher
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10597-10623
id: deschenaux24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10597
lastpage: 10623
published: 2024-07-08 00:00:00 +0000
- title: 'Multicalibration for Confidence Scoring in LLMs'
abstract: 'This paper proposes the use of "multicalibration": to yield interpretable and reliable confidence scores for outputs generated by large language models (LLMs). Multicalibration asks for calibration not just marginally, but simultaneously across various intersecting groupings of the data. We show how to form groupings for prompt/completion pairs that are correlated with the probability of correctness via two techniques: clustering within an embedding space, and "self-annotation" - querying the LLM by asking it various yes-or-no questions about the prompt. We also develop novel variants of multicalibration algorithms that offer performance improvements by reducing their tendency to overfit. Through systematic benchmarking across various question answering datasets and LLMs, we show how our techniques can yield confidence scores that provide substantial improvements in fine-grained measures of both calibration and accuracy compared to existing methods.'
volume: 235
URL: https://proceedings.mlr.press/v235/detommaso24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/detommaso24a/detommaso24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-detommaso24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Gianluca
family: Detommaso
- given: Martin Andres
family: Bertran
- given: Riccardo
family: Fogliato
- given: Aaron
family: Roth
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10624-10641
id: detommaso24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10624
lastpage: 10641
published: 2024-07-08 00:00:00 +0000
- title: 'Contextualized Policy Recovery: Modeling and Interpreting Medical Decisions with Adaptive Imitation Learning'
abstract: 'Interpretable policy learning seeks to estimate intelligible decision policies from observed actions; however, existing models force a tradeoff between accuracy and interpretability, limiting data-driven interpretations of human decision-making processes. Fundamentally, existing approaches are burdened by this tradeoff because they represent the underlying decision process as a universal policy, when in fact human decisions are dynamic and can change drastically under different contexts. Thus, we develop Contextualized Policy Recovery (CPR), which re-frames the problem of modeling complex decision processes as a multi-task learning problem, where each context poses a unique task and complex decision policies can be constructed piece-wise from many simple context-specific policies. CPR models each context-specific policy as a linear map, and generates new policy models *on-demand* as contexts are updated with new observations. We provide two flavors of the CPR framework: one focusing on exact local interpretability, and one retaining full global interpretability. We assess CPR through studies on simulated and real data, achieving state-of-the-art performance on predicting antibiotic prescription in intensive care units ($+22$% AUROC vs. previous SOTA) and predicting MRI prescription for Alzheimer’s patients ($+7.7$% AUROC vs. previous SOTA). With this improvement, CPR closes the accuracy gap between interpretable and black-box methods, allowing high-resolution exploration and analysis of context-specific decision models.'
volume: 235
URL: https://proceedings.mlr.press/v235/deuschel24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/deuschel24a/deuschel24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-deuschel24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jannik
family: Deuschel
- given: Caleb
family: Ellington
- given: Yingtao
family: Luo
- given: Ben
family: Lengerich
- given: Pascal
family: Friederich
- given: Eric P.
family: Xing
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10642-10660
id: deuschel24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10642
lastpage: 10660
published: 2024-07-08 00:00:00 +0000
- title: 'Stability and Multigroup Fairness in Ranking with Uncertain Predictions'
abstract: 'Rankings are ubiquitous across many applications, from search engines to hiring committees. In practice, many rankings are derived from the output of predictors. However, when predictors trained for classification tasks have intrinsic uncertainty, it is not obvious how this uncertainty should be represented in the derived rankings. Our work considers ranking functions: maps from individual predictions for a classification task to distributions over rankings. We focus on two aspects of ranking functions: stability to perturbations in predictions and fairness towards both individuals and subgroups. Not only is stability an important requirement for its own sake, but — as we show — it composes harmoniously with individual fairness in the sense of Dwork et al. (2012). While deterministic ranking functions cannot be stable aside from trivial scenarios, we show that the recently proposed uncertainty aware (UA) ranking functions of Singh et al. (2021) are stable. Our main result is that UA rankings also achieve group fairness through successful composition with multiaccurate or multicalibrated predictors. Our work demonstrates that UA rankings naturally interpolate between group and individual level fairness guarantees, while simultaneously satisfying stability guarantees important whenever machine-learned predictions are used.'
volume: 235
URL: https://proceedings.mlr.press/v235/devic24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/devic24a/devic24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-devic24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Siddartha
family: Devic
- given: Aleksandra
family: Korolova
- given: David
family: Kempe
- given: Vatsal
family: Sharan
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10661-10686
id: devic24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10661
lastpage: 10686
published: 2024-07-08 00:00:00 +0000
- title: 'Locally Interdependent Multi-Agent MDP: Theoretical Framework for Decentralized Agents with Dynamic Dependencies'
abstract: 'Many multi-agent systems in practice are decentralized and have dynamically varying dependencies. There has been a lack of attempts in the literature to analyze these systems theoretically. In this paper, we propose and theoretically analyze a decentralized model with dynamically varying dependencies called the Locally Interdependent Multi-Agent MDP. This model can represent problems in many disparate domains such as cooperative navigation, obstacle avoidance, and formation control. Despite the intractability that general partially observable multi-agent systems suffer from, we propose three closed-form policies that are theoretically near-optimal in this setting and can be scalable to compute and store. Consequentially, we reveal a fundamental property of Locally Interdependent Multi-Agent MDP’s that the partially observable decentralized solution is exponentially close to the fully observable solution with respect to the visibility radius. We then discuss extensions of our closed-form policies to further improve tractability. We conclude by providing simulations to investigate some long horizon behaviors of our closed-form policies.'
volume: 235
URL: https://proceedings.mlr.press/v235/deweese24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/deweese24a/deweese24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-deweese24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alex
family: Deweese
- given: Guannan
family: Qu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10687-10709
id: deweese24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10687
lastpage: 10709
published: 2024-07-08 00:00:00 +0000
- title: 'Bivariate Causal Discovery using Bayesian Model Selection'
abstract: 'Much of the causal discovery literature prioritises guaranteeing the identifiability of causal direction in statistical models. For structures within a Markov equivalence class, this requires strong assumptions which may not hold in real-world datasets, ultimately limiting the usability of these methods. Building on previous attempts, we show how to incorporate causal assumptions within the Bayesian framework. Identifying causal direction then becomes a Bayesian model selection problem. This enables us to construct models with realistic assumptions, and consequently allows for the differentiation between Markov equivalent causal structures. We analyse why Bayesian model selection works in situations where methods based on maximum likelihood fail. To demonstrate our approach, we construct a Bayesian non-parametric model that can flexibly model the joint distribution. We then outperform previous methods on a wide range of benchmark datasets with varying data generating assumptions.'
volume: 235
URL: https://proceedings.mlr.press/v235/dhir24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/dhir24a/dhir24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-dhir24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Anish
family: Dhir
- given: Samuel
family: Power
- given: Mark
family: Van Der Wilk
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10710-10735
id: dhir24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10710
lastpage: 10735
published: 2024-07-08 00:00:00 +0000
- title: 'Trust Regions for Explanations via Black-Box Probabilistic Certification'
abstract: 'Given the black box nature of machine learning models, a plethora of explainability methods have been developed to decipher the factors behind individual decisions. In this paper, we introduce a novel problem of black box (probabilistic) explanation certification. We ask the question: Given a black box model with only query access, an explanation for an example and a quality metric (viz. fidelity, stability), can we find the largest hypercube (i.e., $\ell_{\infty}$ ball) centered at the example such that when the explanation is applied to all examples within the hypercube, (with high probability) a quality criterion is met (viz. fidelity greater than some value)? Being able to efficiently find such a *trust region* has multiple benefits: i) insight into model behavior in a *region*, with a *guarantee*; ii) ascertained *stability* of the explanation; iii) *explanation reuse*, which can save time, energy and money by not having to find explanations for every example; and iv) a possible *meta-metric* to compare explanation methods. Our contributions include formalizing this problem, proposing solutions, providing theoretical guarantees for these solutions that are computable, and experimentally showing their efficacy on synthetic and real data.'
volume: 235
URL: https://proceedings.mlr.press/v235/dhurandhar24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/dhurandhar24a/dhurandhar24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-dhurandhar24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Amit
family: Dhurandhar
- given: Swagatam
family: Haldar
- given: Dennis
family: Wei
- given: Karthikeyan
family: Natesan Ramamurthy
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10736-10764
id: dhurandhar24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10736
lastpage: 10764
published: 2024-07-08 00:00:00 +0000
- title: 'Double Stochasticity Gazes Faster: Snap-Shot Decentralized Stochastic Gradient Tracking Methods'
abstract: 'In decentralized optimization, $m$ agents form a network and only communicate with their neighbors, which gives advantages in data ownership, privacy, and scalability. At the same time, decentralized stochastic gradient descent ($\texttt{SGD}$) methods, as popular decentralized algorithms for training large-scale machine learning models, have shown their superiority over centralized counterparts. Distributed stochastic gradient tracking $\texttt{DSGT}$ has been recognized as the popular and state-of-the-art decentralized $\texttt{SGD}$ method due to its proper theoretical guarantees. However, the theoretical analysis of $\texttt{DSGT}$ shows that its iteration complexity is $\tilde{\mathcal{O}} \left(\frac{\bar{\sigma}^2}{m\mu \varepsilon} + \frac{\sqrt{L}\bar{\sigma}}{\mu(1 - \lambda_2(W))^{1/2} C_W \sqrt{\varepsilon} }\right)$, where the doubly stochastic matrix $W$ represents the network topology and $ C_W $ is a parameter that depends on $W$. Thus, it indicates that the convergence property of $\texttt{DSGT}$ is heavily affected by the topology of the communication network. To overcome the weakness of $\texttt{DSGT}$, we resort to the snap-shot gradient tracking skill and propose two novel algorithms, snap-shot $\texttt{DSGT}$ ($\texttt{SS-DSGT}$) and accelerated snap-shot $\texttt{DSGT}$ ($\texttt{ASS-DSGT}$). We further justify that $\texttt{SS-DSGT}$ exhibits a lower iteration complexity compared to $\texttt{DSGT}$ in the general communication network topology. Additionally, $\texttt{ASS-DSGT}$ matches $\texttt{DSGT}$’s iteration complexity $\mathcal{O}\left( \frac{\bar{\sigma}^2}{m\mu \varepsilon} + \frac{\sqrt{L}\bar{\sigma}}{\mu (1 - \lambda_2(W))^{1/2}\sqrt{\varepsilon}} \right)$ under the same conditions as $\texttt{DSGT}$. Numerical experiments validate $\texttt{SS-DSGT}$’s superior performance performance in the general communication network topology and exhibit better practical performance of $\texttt{ASS-DSGT}$ on the specified $W$ compared to $\texttt{DSGT}$.'
volume: 235
URL: https://proceedings.mlr.press/v235/di24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/di24a/di24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-di24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hao
family: Di
- given: Haishan
family: Ye
- given: Xiangyu
family: Chang
- given: Guang
family: Dai
- given: Ivor
family: Tsang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10765-10791
id: di24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10765
lastpage: 10791
published: 2024-07-08 00:00:00 +0000
- title: 'Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems without First-Order Gradient'
abstract: 'Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods. However, in composite optimization problems, ZO methods encounter an additional variance called the coordinate-wise variance, which stems from the random gradient estimation. To reduce this variance, prior works require estimating all partial derivatives, essentially approximating FO information. This approach demands $\mathcal{O}(d)$ function evaluations ($d$ is the dimension size), which incurs substantial computational costs and is prohibitive in high-dimensional scenarios. This paper proposes the Zeroth-order Proximal Double Variance Reduction ($\texttt{ZPDVR}$) method, which utilizes the averaging trick to reduce both sampling and coordinate-wise variances. Compared to prior methods, $\texttt{ZPDVR}$ relies solely on random gradient estimates, calls the stochastic zeroth-order oracle (SZO) in expectation $\mathcal{O}(1)$ times per iteration, and achieves the optimal $\mathcal{O}(d(n + \kappa)\log (\frac{1}{\epsilon}))$ SZO query complexity in the strongly convex and smooth setting, where $\kappa$ represents the condition number and $\epsilon$ is the desired accuracy. Empirical results validate $\texttt{ZPDVR}$’s linear convergence and demonstrate its superior performance over other related methods.'
volume: 235
URL: https://proceedings.mlr.press/v235/di24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/di24b/di24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-di24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hao
family: Di
- given: Haishan
family: Ye
- given: Yueling
family: Zhang
- given: Xiangyu
family: Chang
- given: Guang
family: Dai
- given: Ivor
family: Tsang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10792-10810
id: di24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10792
lastpage: 10810
published: 2024-07-08 00:00:00 +0000
- title: 'Robust Sparse Estimation for Gaussians with Optimal Error under Huber Contamination'
abstract: 'We study Gaussian sparse estimation tasks in Huber’s contamination model with a focus on mean estimation, PCA, and linear regression. For each of these tasks, we give the first sample and computationally efficient robust estimators with optimal error guarantees, within constant factors. All prior efficient algorithms for these tasks incur quantitatively suboptimal error. Concretely, for Gaussian robust $k$-sparse mean estimation on $\mathbb{R}^d$ with corruption rate $\epsilon>0$, our algorithm has sample complexity $(k^2/\epsilon ^2)\mathrm{polylog}(d/\epsilon)$, runs in sample polynomial time, and approximates the target mean within $\ell_2$-error $O(\epsilon)$. Previous efficient algorithms inherently incur error $\Omega(\epsilon \sqrt{\log(1/\epsilon)})$. At the technical level, we develop a novel multidimensional filtering method in the sparse regime that may find other applications.'
volume: 235
URL: https://proceedings.mlr.press/v235/diakonikolas24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/diakonikolas24a/diakonikolas24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-diakonikolas24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ilias
family: Diakonikolas
- given: Daniel
family: Kane
- given: Sushrut
family: Karmalkar
- given: Ankit
family: Pensia
- given: Thanasis
family: Pittas
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10811-10840
id: diakonikolas24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10811
lastpage: 10840
published: 2024-07-08 00:00:00 +0000
- title: 'Fast Co-Training under Weak Dependence via Stream-Based Active Learning'
abstract: 'Co-training is a classical semi-supervised learning method which only requires a small number of labeled examples for learning, under reasonable assumptions. Despite extensive literature on the topic, very few hypothesis classes are known to be provably efficiently learnable via co-training, even under very strong distributional assumptions. In this work, we study the co-training problem in the stream-based active learning model. We show that a range of natural concept classes are efficiently learnable via co-training, in terms of both label efficiency and computational efficiency. We provide an efficient reduction of co-training under the standard assumption of weak dependence, in the stream-based active model, to online classification. As a corollary, we obtain efficient co-training algorithms with error independent label complexity for every concept class class efficiently learnable in the mistake bound online model. Our framework also gives co-training algorithms with label complexity $\tilde{O}(d\log (1/\epsilon))$ for any concept class with VC dimension $d$, though in general this reduction is not computationally efficient. Finally, using additional ideas from online learning, we design the first efficient co-training algorithms with label complexity $\tilde{O}(d^2\log (1/\epsilon))$ for several concept classes, including unions of intervals and homogeneous halfspaces.'
volume: 235
URL: https://proceedings.mlr.press/v235/diakonikolas24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/diakonikolas24b/diakonikolas24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-diakonikolas24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ilias
family: Diakonikolas
- given: Mingchen
family: Ma
- given: Lisheng
family: Ren
- given: Christos
family: Tzamos
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10841-10864
id: diakonikolas24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10841
lastpage: 10864
published: 2024-07-08 00:00:00 +0000
- title: 'Convex and Bilevel Optimization for Neural-Symbolic Inference and Learning'
abstract: 'We leverage convex and bilevel optimization techniques to develop a general gradient-based parameter learning framework for neural-symbolic (NeSy) systems. We demonstrate our framework with NeuPSL, a state-of-the-art NeSy architecture. To achieve this, we propose a smooth primal and dual formulation of NeuPSL inference and show learning gradients are functions of the optimal dual variables. Additionally, we develop a dual block coordinate descent algorithm for the new formulation that naturally exploits warm-starts. This leads to over $100 \times$ learning runtime improvements over the current best NeuPSL inference method. Finally, we provide extensive empirical evaluations across $8$ datasets covering a range of tasks and demonstrate our learning framework achieves up to a $16$% point prediction performance improvement over alternative learning methods.'
volume: 235
URL: https://proceedings.mlr.press/v235/dickens24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/dickens24a/dickens24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-dickens24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Charles Andrew
family: Dickens
- given: Changyu
family: Gao
- given: Connor
family: Pryor
- given: Stephen
family: Wright
- given: Lise
family: Getoor
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10865-10896
id: dickens24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10865
lastpage: 10896
published: 2024-07-08 00:00:00 +0000
- title: 'Structure Your Data: Towards Semantic Graph Counterfactuals'
abstract: 'Counterfactual explanations (CEs) based on concepts are explanations that consider alternative scenarios to understand which high-level semantic features contributed to particular model predictions. In this work, we propose CEs based on the semantic graphs accompanying input data to achieve more descriptive, accurate, and human-aligned explanations. Building upon state-of-the-art (SotA) conceptual attempts, we adopt a model-agnostic edit-based approach and introduce leveraging GNNs for efficient Graph Edit Distance (GED) computation. With a focus on the visual domain, we represent images as scene graphs and obtain their GNN embeddings to bypass solving the NP-hard graph similarity problem for all input pairs, an integral part of CE computation process. We apply our method to benchmark and real-world datasets with varying difficulty and availability of semantic annotations. Testing on diverse classifiers, we find that our CEs outperform previous SotA explanation models based on semantics, including both white and black-box as well as conceptual and pixel-level approaches. Their superiority is proven quantitatively and qualitatively, as validated by human subjects, highlighting the significance of leveraging semantic edges in the presence of intricate relationships. Our model-agnostic graph-based approach is widely applicable and easily extensible, producing actionable explanations across different contexts. The code is available at https://github.com/aggeliki-dimitriou/SGCE.'
volume: 235
URL: https://proceedings.mlr.press/v235/dimitriou24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/dimitriou24a/dimitriou24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-dimitriou24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Angeliki
family: Dimitriou
- given: Maria
family: Lymperaiou
- given: Georgios
family: Filandrianos
- given: Konstantinos
family: Thomas
- given: Giorgos
family: Stamou
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10897-10926
id: dimitriou24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10897
lastpage: 10926
published: 2024-07-08 00:00:00 +0000
- title: 'Efficient Algorithms for Sum-Of-Minimum Optimization'
abstract: 'In this work, we propose a novel optimization model termed “sum-of-minimum" optimization. This model seeks to minimize the sum or average of $N$ objective functions over $k$ parameters, where each objective takes the minimum value of a predefined sub-function with respect to the $k$ parameters. This universal framework encompasses numerous clustering applications in machine learning and related fields. We develop efficient algorithms for solving sum-of-minimum optimization problems, inspired by a randomized initialization algorithm for the classic $k$-means (Arthur & Vassilvitskii, 2007) and Lloyd’s algorithm (Lloyd, 1982). We establish a new tight bound for the generalized initialization algorithm and prove a gradient-descent-like convergence rate for generalized Lloyd’s algorithm. The efficiency of our algorithms is numerically examined on multiple tasks, including generalized principal component analysis, mixed linear regression, and small-scale neural network training. Our approach compares favorably to previous ones based on simpler-but-less-precise optimization reformulations.'
volume: 235
URL: https://proceedings.mlr.press/v235/ding24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/ding24a/ding24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-ding24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lisang
family: Ding
- given: Ziang
family: Chen
- given: Xinshang
family: Wang
- given: Wotao
family: Yin
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10927-10959
id: ding24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10927
lastpage: 10959
published: 2024-07-08 00:00:00 +0000
- title: 'AMPA: Adaptive Mixed Precision Allocation for Low-Bit Integer Training'
abstract: 'Low-bit integer training emerges as a promising approach to mitigate the heavy burden during network training by quantizing the weights, activations, and gradients. However, existing methods cannot well achieve mixed-precision quantization for low-bit training and are commonly limited to INT8 precision. In this paper, we propose a novel low-bit integer training framework that, for the first time, achieves adaptive mixed-precision allocation (AMPA) for weights, activations, and gradients, and pushes the boundaries to a precision level below INT8. We develop a novel magnitude-based sensitivity measurement with regard to the quantization losses of weight, activation, and gradient quantization and the average gradient magnitudes, which is demonstrated as an upper bound of quantization influence in theory. We further design a layer-wise precision update strategy under observations on the quantization losses and their effects on model performance in low-bit training. Extensive experiments on different backbones and datasets show that, compared to INT8 quantization, the proposed method can achieve more than 38% BitOPs reduction with a tolerable loss below 2% in image classification, image segmentation, and language modeling.'
volume: 235
URL: https://proceedings.mlr.press/v235/ding24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/ding24b/ding24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-ding24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Li
family: Ding
- given: Wen
family: Fei
- given: Yuyang
family: Huang
- given: Shuangrui
family: Ding
- given: Wenrui
family: Dai
- given: Chenglin
family: Li
- given: Junni
family: Zou
- given: Hongkai
family: Xiong
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10960-10977
id: ding24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10960
lastpage: 10977
published: 2024-07-08 00:00:00 +0000
- title: 'Understanding Forgetting in Continual Learning with Linear Regression'
abstract: 'Continual learning, focused on sequentially learning multiple tasks, has gained significant attention recently. Despite the tremendous progress made in the past, the theoretical understanding, especially factors contributing to $\textit{catastrophic forgetting}$, remains relatively unexplored. In this paper, we provide a general theoretical analysis of forgetting in the linear regression model via Stochastic Gradient Descent (SGD) applicable to both under-parameterized and overparameterized regimes. Our theoretical framework reveals some interesting insights into the intricate relationship between task sequence and algorithmic parameters, an aspect not fully captured in previous studies due to their restrictive assumptions. Specifically, we demonstrate that, given a sufficiently large data size, the arrangement of tasks in a sequence—where tasks with larger eigenvalues in their population data covariance matrices are trained later—tends to result in increased forgetting. Additionally, our findings highlight that an appropriate choice of step size will help mitigate forgetting in both under-parameterized and overparameterized settings. To validate our theoretical analysis, we conducted simulation experiments on both linear regression models and Deep Neural Networks (DNNs). Results from these simulations substantiate our theoretical findings.'
volume: 235
URL: https://proceedings.mlr.press/v235/ding24c.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/ding24c/ding24c.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-ding24c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Meng
family: Ding
- given: Kaiyi
family: Ji
- given: Di
family: Wang
- given: Jinhui
family: Xu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 10978-11001
id: ding24c
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 10978
lastpage: 11001
published: 2024-07-08 00:00:00 +0000
- title: 'Recurrent Distance Filtering for Graph Representation Learning'
abstract: 'Graph neural networks based on iterative one-hop message passing have been shown to struggle in harnessing the information from distant nodes effectively. Conversely, graph transformers allow each node to attend to all other nodes directly, but lack graph inductive bias and have to rely on ad-hoc positional encoding. In this paper, we propose a new architecture to reconcile these challenges. Our approach stems from the recent breakthroughs in long-range modeling provided by deep state-space models: for a given target node, our model aggregates other nodes by their shortest distances to the target and uses a linear RNN to encode the sequence of hop representations. The linear RNN is parameterized in a particular diagonal form for stable long-range signal propagation and is theoretically expressive enough to encode the neighborhood hierarchy. With no need for positional encoding, we empirically show that the performance of our model is comparable to or better than that of state-of-the-art graph transformers on various benchmarks, with a significantly reduced computational cost. Our code is open-source at https://github.com/skeletondyh/GRED.'
volume: 235
URL: https://proceedings.mlr.press/v235/ding24d.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/ding24d/ding24d.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-ding24d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yuhui
family: Ding
- given: Antonio
family: Orvieto
- given: Bobby
family: He
- given: Thomas
family: Hofmann
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 11002-11015
id: ding24d
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 11002
lastpage: 11015
published: 2024-07-08 00:00:00 +0000
- title: 'Robust Stable Spiking Neural Networks'
abstract: 'Spiking neural networks (SNNs) are gaining popularity in deep learning due to their low energy budget on neuromorphic hardware. However, they still face challenges in lacking sufficient robustness to guard safety-critical applications such as autonomous driving. Many studies have been conducted to defend SNNs from the threat of adversarial attacks. This paper aims to uncover the robustness of SNN through the lens of the stability of nonlinear systems. We are inspired by the fact that searching for parameters altering the leaky integrate-and-fire dynamics can enhance their robustness. Thus, we dive into the dynamics of membrane potential perturbation and simplify the formulation of the dynamics. We present that membrane potential perturbation dynamics can reliably convey the intensity of perturbation. Our theoretical analyses imply that the simplified perturbation dynamics satisfy input-output stability. Thus, we propose a training framework with modified SNN neurons and to reduce the mean square of membrane potential perturbation aiming at enhancing the robustness of SNN. Finally, we experimentally verify the effectiveness of the framework in the setting of Gaussian noise training and adversarial training on the image classification task. Please refer to https://github.com/DingJianhao/stable-snn for our code implementation.'
volume: 235
URL: https://proceedings.mlr.press/v235/ding24e.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/ding24e/ding24e.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-ding24e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jianhao
family: Ding
- given: Zhiyu
family: Pan
- given: Yujia
family: Liu
- given: Zhaofei
family: Yu
- given: Tiejun
family: Huang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 11016-11029
id: ding24e
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 11016
lastpage: 11029
published: 2024-07-08 00:00:00 +0000
- title: 'Fewer Truncations Improve Language Modeling'
abstract: 'In large language model training, input documents are typically concatenated together and then split into sequences of equal length to avoid padding tokens. Despite its efficiency, the concatenation approach compromises data integrity—it inevitably breaks many documents into incomplete pieces, leading to excessive truncations that hinder the model from learning to compose logically coherent and factually consistent content that is grounded on the complete context. To address the issue, we propose Best-fit Packing, a scalable and efficient method that packs documents into training sequences through length-aware combinatorial optimization. Our method completely eliminates unnecessary truncations while retaining the same training efficiency as concatenation. Empirical results from both text and code pre-training show that our method achieves superior performance (e.g., +4.7% on reading comprehension; +16.8% in context following; and +9.2% on program synthesis), and reduces closed-domain hallucination effectively by up to 58.3%.'
volume: 235
URL: https://proceedings.mlr.press/v235/ding24f.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/ding24f/ding24f.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-ding24f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hantian
family: Ding
- given: Zijian
family: Wang
- given: Giovanni
family: Paolini
- given: Varun
family: Kumar
- given: Anoop
family: Deoras
- given: Dan
family: Roth
- given: Stefano
family: Soatto
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 11030-11048
id: ding24f
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 11030
lastpage: 11048
published: 2024-07-08 00:00:00 +0000
- title: 'Delving into Differentially Private Transformer'
abstract: 'Deep learning with differential privacy (DP) has garnered significant attention over the past years, leading to the development of numerous methods aimed at enhancing model accuracy and training efficiency. This paper delves into the problem of training Transformer models with differential privacy. Our treatment is modular: the logic is to ’reduce’ the problem of training DP Transformer to the more basic problem of training DP vanilla neural nets. The latter is better understood and amenable to many model-agnostic methods. Such ’reduction’ is done by first identifying the hardness unique to DP Transformer training: the attention distraction phenomenon and a lack of compatibility with existing techniques for efficient gradient clipping. To deal with these two issues, we propose the Re-Attention Mechanism and Phantom Clipping, respectively. We believe that our work not only casts new light on training DP Transformers but also promotes a modular treatment to advance research in the field of differentially private deep learning.'
volume: 235
URL: https://proceedings.mlr.press/v235/ding24g.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/ding24g/ding24g.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-ding24g.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Youlong
family: Ding
- given: Xueyang
family: Wu
- given: Yining
family: Meng
- given: Yonggang
family: Luo
- given: Hao
family: Wang
- given: Weike
family: Pan
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 11049-11071
id: ding24g
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 11049
lastpage: 11071
published: 2024-07-08 00:00:00 +0000
- title: 'Quality Diversity through Human Feedback: Towards Open-Ended Diversity-Driven Optimization'
abstract: 'Reinforcement Learning from Human Feedback (RLHF) has shown potential in qualitative tasks where easily defined performance measures are lacking. However, there are drawbacks when RLHF is commonly used to optimize for average human preferences, especially in generative tasks that demand diverse model responses. Meanwhile, Quality Diversity (QD) algorithms excel at identifying diverse and high-quality solutions but often rely on manually crafted diversity metrics. This paper introduces Quality Diversity through Human Feedback (QDHF), a novel approach that progressively infers diversity metrics from human judgments of similarity among solutions, thereby enhancing the applicability and effectiveness of QD algorithms in complex and open-ended domains. Empirical studies show that QDHF significantly outperforms state-of-the-art methods in automatic diversity discovery and matches the efficacy of QD with manually crafted diversity metrics on standard benchmarks in robotics and reinforcement learning. Notably, in open-ended generative tasks, QDHF substantially enhances the diversity of text-to-image generation from a diffusion model and is more favorably received in user studies. We conclude by analyzing QDHF’s scalability, robustness, and quality of derived diversity metrics, emphasizing its strength in open-ended optimization tasks. Code and tutorials are available at https://liding.info/qdhf.'
volume: 235
URL: https://proceedings.mlr.press/v235/ding24h.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/ding24h/ding24h.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-ding24h.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Li
family: Ding
- given: Jenny
family: Zhang
- given: Jeff
family: Clune
- given: Lee
family: Spector
- given: Joel
family: Lehman
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 11072-11090
id: ding24h
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 11072
lastpage: 11090
published: 2024-07-08 00:00:00 +0000
- title: 'LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens'
abstract: 'Large context window is a desirable feature in large language models (LLMs). However, due to high fine-tuning costs, scarcity of long texts, and catastrophic values introduced by new token positions, current extended context windows are limited to around 128k tokens. This paper introduces LongRoPE that, for the first time, extends the context window of pre-trained LLMs to an impressive 2048k tokens, with up to only 1k fine-tuning steps at within 256k training lengths, while maintaining performance at the original short context window. This is achieved by three key innovations: (i) we identify and exploit two forms of non-uniformities in positional interpolation through an efficient search, providing a better initialization for fine-tuning and enabling an 8x extension in non-fine-tuning scenarios; (ii) we introduce a progressive extension strategy that first fine-tunes a 256k length LLM and then conducts a second positional interpolation on the fine-tuned extended LLM to achieve a 2048k context window; (iii) we readjust LongRoPE on 8k length to recover the short context window performance. Extensive experiments on LLaMA2 and Mistral across various tasks demonstrate the effectiveness of our method. Models extended via LongRoPE retain the original architecture with minor modifications to the positional embedding, and can reuse most pre-existing optimizations. Code is available at https://github.com/microsoft/LongRoPE'
volume: 235
URL: https://proceedings.mlr.press/v235/ding24i.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/ding24i/ding24i.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-ding24i.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yiran
family: Ding
- given: Li Lyna
family: Zhang
- given: Chengruidong
family: Zhang
- given: Yuanyuan
family: Xu
- given: Ning
family: Shang
- given: Jiahang
family: Xu
- given: Fan
family: Yang
- given: Mao
family: Yang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 11091-11104
id: ding24i
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 11091
lastpage: 11104
published: 2024-07-08 00:00:00 +0000
- title: 'Learning-Rate-Free Stochastic Optimization over Riemannian Manifolds'
abstract: 'In recent years, interest in gradient-based optimization over Riemannian manifolds has surged. However, a significant challenge lies in the reliance on hyperparameters, especially the learning rate, which requires meticulous tuning by practitioners to ensure convergence at a suitable rate. In this work, we introduce innovative learning-rate-free algorithms for stochastic optimization over Riemannian manifolds, eliminating the need for hand-tuning and providing a more robust and user-friendly approach. We establish high probability convergence guarantees that are optimal, up to logarithmic factors, compared to the best-known optimally tuned rate in the deterministic setting. Our approach is validated through numerical experiments, demonstrating competitive performance against learning-rate-dependent algorithms.'
volume: 235
URL: https://proceedings.mlr.press/v235/dodd24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/dodd24a/dodd24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-dodd24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Daniel
family: Dodd
- given: Louis
family: Sharrock
- given: Christopher
family: Nemeth
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 11105-11148
id: dodd24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 11105
lastpage: 11148
published: 2024-07-08 00:00:00 +0000
- title: 'Consistent Adversarially Robust Linear Classification: Non-Parametric Setting'
abstract: 'For binary classification in $d$ dimensions, it is known that with a sample size of $n$, an excess adversarial risk of $O(d/n)$ is achievable under strong parametric assumptions about the underlying data distribution (e.g., assuming a Gaussian mixture model). In the case of well-separated distributions, this rate can be further refined to $O(1/n)$. Our work studies the non-parametric setting, where very little is known. With only mild regularity conditions on the conditional distribution of the features, we examine adversarial attacks with respect to arbitrary norms and introduce a straightforward yet effective estimator with provable consistency w.r.t adversarial risk. Our estimator is given by minimizing a series of smoothed versions of the robust 0/1 loss, with a smoothing bandwidth that adapts to both $n$ and $d$. Furthermore, we demonstrate that our estimator can achieve the minimax excess adversarial risk of $\widetilde O(\sqrt{d/n})$ for linear classifiers, at the cost of solving possibly rougher optimization problems.'
volume: 235
URL: https://proceedings.mlr.press/v235/dohmatob24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/dohmatob24a/dohmatob24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-dohmatob24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Elvis
family: Dohmatob
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 11149-11164
id: dohmatob24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 11149
lastpage: 11164
published: 2024-07-08 00:00:00 +0000
- title: 'A Tale of Tails: Model Collapse as a Change of Scaling Laws'
abstract: 'As AI model size grows, neural *scaling laws* have become a crucial tool to predict the improvements of large models when increasing capacity and the size of original (human or natural) training data. Yet, the widespread use of popular models means that the ecosystem of online data and text will co-evolve to progressively contain increased amounts of synthesized data. In this paper we ask: *How will the scaling laws change in the inevitable regime where synthetic data makes its way into the training corpus?* Will future models, still improve, or be doomed to degenerate up to total *(model) collapse*? We develop a theoretical framework of model collapse through the lens of scaling laws. We discover a wide range of decay phenomena, analyzing loss of scaling, shifted scaling with number of generations, the ”un-learning" of skills, and grokking when mixing human and synthesized data. Our theory is validated by large-scale experiments with a transformer on an arithmetic task and text generation using the large language model Llama2.'
volume: 235
URL: https://proceedings.mlr.press/v235/dohmatob24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/dohmatob24b/dohmatob24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-dohmatob24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Elvis
family: Dohmatob
- given: Yunzhen
family: Feng
- given: Pu
family: Yang
- given: Francois
family: Charton
- given: Julia
family: Kempe
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 11165-11197
id: dohmatob24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 11165
lastpage: 11197
published: 2024-07-08 00:00:00 +0000
- title: 'Precise Accuracy / Robustness Tradeoffs in Regression: Case of General Norms'
abstract: 'In this paper, we investigate the impact of test-time adversarial attacks on linear regression models and determine the optimal level of robustness that any model can reach while maintaining a given level of standard predictive performance (accuracy). Through quantitative estimates, we uncover fundamental tradeoffs between adversarial robustness and accuracy in different regimes. We obtain a precise characterization which distinguishes between regimes where robustness is achievable without hurting standard accuracy and regimes where a tradeoff might be unavoidable. Our findings are empirically confirmed with simple experiments that represent a variety of settings. This work covers feature covariance matrices and attack norms of any nature, extending previous works in this area.'
volume: 235
URL: https://proceedings.mlr.press/v235/dohmatob24c.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/dohmatob24c/dohmatob24c.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-dohmatob24c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Elvis
family: Dohmatob
- given: Meyer
family: Scetbon
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 11198-11226
id: dohmatob24c
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 11198
lastpage: 11226
published: 2024-07-08 00:00:00 +0000
- title: 'Spectral Preconditioning for Gradient Methods on Graded Non-convex Functions'
abstract: 'The performance of optimization methods is often tied to the spectrum of the objective Hessian. Yet, conventional assumptions, such as smoothness, do often not enable us to make finely-grained convergence statements—particularly not for non-convex problems. Striving for a more intricate characterization of complexity, we introduce a unique concept termed graded non-convexity. This allows to partition the class of non-convex problems into a nested chain of subclasses. Interestingly, many traditional non-convex objectives, including partially convex problems, matrix factorizations, and neural networks, fall within these subclasses. As a second contribution, we propose gradient methods with spectral preconditioning, which employ inexact top eigenvectors of the Hessian to address the ill-conditioning of the problem, contingent on the grade. Our analysis reveals that these new methods provide provably superior convergence rates compared to basic gradient descent on applicable problem classes, particularly when large gaps exist between the top eigenvalues of the Hessian. Our theory is validated by numerical experiments executed on multiple practical machine learning problems.'
volume: 235
URL: https://proceedings.mlr.press/v235/doikov24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/doikov24a/doikov24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-doikov24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Nikita
family: Doikov
- given: Sebastian U
family: Stich
- given: Martin
family: Jaggi
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 11227-11252
id: doikov24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 11227
lastpage: 11252
published: 2024-07-08 00:00:00 +0000
- title: 'Impact of Decentralized Learning on Player Utilities in Stackelberg Games'
abstract: 'When deployed in the world, a learning agent such as a recommender system or a chatbot often repeatedly interacts with another learning agent (such as a user) over time. In many such two-agent systems, each agent learns separately and the rewards of the two agents are not perfectly aligned. To better understand such cases, we examine the learning dynamics of the two-agent system and the implications for each agent’s objective. We model these systems as Stackelberg games with decentralized learning and show that standard regret benchmarks (such as Stackelberg equilibrium payoffs) result in worst-case linear regret for at least one player. To better capture these systems, we construct a relaxed regret benchmark that is tolerant to small learning errors by agents. We show that standard learning algorithms fail to provide sublinear regret, and we develop algorithms to achieve near-optimal $\mathcal{O}(T^{2/3})$ regret for both players with respect to these benchmarks. We further design relaxed environments under which faster learning ($\mathcal{O}(\sqrt{T})$) is possible. Altogether, our results take a step towards assessing how two-agent interactions in sequential and decentralized learning environments affect the utility of both agents.'
volume: 235
URL: https://proceedings.mlr.press/v235/donahue24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/donahue24a/donahue24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-donahue24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Kate
family: Donahue
- given: Nicole
family: Immorlica
- given: Meena
family: Jagadeesan
- given: Brendan
family: Lucier
- given: Aleksandrs
family: Slivkins
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 11253-11310
id: donahue24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 11253
lastpage: 11310
published: 2024-07-08 00:00:00 +0000
- title: 'Towards Generalization beyond Pointwise Learning: A Unified Information-theoretic Perspective'
abstract: 'The recent surge in contrastive learning has intensified the interest in understanding the generalization of non-pointwise learning paradigms. While information-theoretic analysis achieves remarkable success in characterizing the generalization behavior of learning algorithms, its applicability is largely confined to pointwise learning, with extensions to the simplest pairwise settings remaining unexplored due to the challenges of non-i.i.d losses and dimensionality explosion. In this paper, we develop the first series of information-theoretic bounds extending beyond pointwise scenarios, encompassing pointwise, pairwise, triplet, quadruplet, and higher-order scenarios, all within a unified framework. Specifically, our hypothesis-based bounds elucidate the generalization behavior of iterative and noisy learning algorithms via gradient covariance analysis, and our prediction-based bounds accurately estimate the generalization gap with computationally tractable low-dimensional information metrics. Comprehensive numerical studies then demonstrate the effectiveness of our bounds in capturing the generalization dynamics across diverse learning scenarios.'
volume: 235
URL: https://proceedings.mlr.press/v235/dong24a.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/dong24a/dong24a.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-dong24a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yuxin
family: Dong
- given: Tieliang
family: Gong
- given: Hong
family: Chen
- given: Zhongjiang
family: He
- given: Mengxiang
family: Li
- given: Shuangyong
family: Song
- given: Chen
family: Li
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 11311-11345
id: dong24a
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 11311
lastpage: 11345
published: 2024-07-08 00:00:00 +0000
- title: 'Pruner-Zero: Evolving Symbolic Pruning Metric From Scratch for Large Language Models'
abstract: 'Despite the remarkable capabilities, Large Language Models (LLMs) face deployment challenges due to their extensive size. Pruning methods drop a subset of weights to accelerate, but many of them require retraining, which is prohibitively expensive and computationally demanding. Recently, post-training pruning approaches introduced novel metrics, enabling the pruning of LLMs without retraining. However, these metrics require the involvement of human experts and tedious trial and error. To efficiently identify superior pruning metrics, we develop an automatic framework for searching symbolic pruning metrics using genetic programming. In particular, we devise an elaborate search space encompassing the existing pruning metrics to discover the potential symbolic pruning metric. We propose an opposing operation simplification strategy to increase the diversity of the population. In this way, Pruner-Zero allows auto-generation of symbolic pruning metrics. Based on the searched results, we explore the correlation between pruning metrics and performance after pruning and summarize some principles. Extensive experiments on LLaMA and LLaMA-2 on language modeling and zero-shot tasks demonstrate that our Pruner-Zero obtains superior performance than SOTA post-training pruning methods. Code at: https://github.com/pprp/Pruner-Zero.'
volume: 235
URL: https://proceedings.mlr.press/v235/dong24b.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/dong24b/dong24b.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-dong24b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Peijie
family: Dong
- given: Lujun
family: Li
- given: Zhenheng
family: Tang
- given: Xiang
family: Liu
- given: Xinglin
family: Pan
- given: Qiang
family: Wang
- given: Xiaowen
family: Chu
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 11346-11374
id: dong24b
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 11346
lastpage: 11374
published: 2024-07-08 00:00:00 +0000
- title: 'Position: Building Guardrails for Large Language Models Requires Systematic Design'
abstract: 'As Large Language Models (LLMs) become more integrated into our daily lives, it is crucial to identify and mitigate their risks, especially when the risks can have profound impacts on human users and societies. Guardrails, which filter the inputs or outputs of LLMs, have emerged as a core safeguarding technology. This position paper takes a deep look at current open-source solutions (Llama Guard, Nvidia NeMo, Guardrails AI), and discusses the challenges and the road towards building more complete solutions. Drawing on robust evidence from previous research, we advocate for a systematic approach to construct guardrails for LLMs, based on comprehensive consideration of diverse contexts across various LLMs applications. We propose employing socio-technical methods through collaboration with a multi-disciplinary team to pinpoint precise technical requirements, exploring advanced neural-symbolic implementations to embrace the complexity of the requirements, and developing verification and testing to ensure the utmost quality of the final product.'
volume: 235
URL: https://proceedings.mlr.press/v235/dong24c.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/dong24c/dong24c.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-dong24c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yi
family: Dong
- given: Ronghui
family: Mu
- given: Gaojie
family: Jin
- given: Yi
family: Qi
- given: Jinwei
family: Hu
- given: Xingyu
family: Zhao
- given: Jie
family: Meng
- given: Wenjie
family: Ruan
- given: Xiaowei
family: Huang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 11375-11394
id: dong24c
issued:
date-parts:
- 2024
- 7
- 8
firstpage: 11375
lastpage: 11394
published: 2024-07-08 00:00:00 +0000
- title: 'Accelerating PDE Data Generation via Differential Operator Action in Solution Space'
abstract: 'Recent advancements in data-driven approaches, such as Neural Operator (NO), have demonstrated their effectiveness in reducing the solving time of Partial Differential Equations (PDEs). However, one major challenge faced by these approaches is the requirement for a large amount of high-precision training data, which needs significant computational costs during the generation process. To address this challenge, we propose a novel PDE dataset generation algorithm, namely **Diff**erential **O**perator **A**ction in **S**olution space (**DiffOAS**), which speeds up the data generation process and enhances the precision of the generated data simultaneously. Specifically, DiffOAS obtains a few basic PDE solutions and then combines them to get solutions. It applies differential operators on these solutions, a process we call ’operator action’, to efficiently generate precise PDE data points. Theoretical analysis shows that the time complexity of DiffOAS method is one order lower than the existing generation method. Experimental results show that DiffOAS accelerates the generation of large-scale datasets with 10,000 instances by 300 times. Even with just 5% of the generation time, NO trained on the data generated by DiffOAS exhibits comparable performance to that using the existing generation method, which highlights the efficiency of DiffOAS.'
volume: 235
URL: https://proceedings.mlr.press/v235/dong24d.html
PDF: https://raw.githubusercontent.com/mlresearch/v235/main/assets/dong24d/dong24d.pdf
edit: https://github.com/mlresearch//v235/edit/gh-pages/_posts/2024-07-08-dong24d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 41st International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Huanshuo
family: Dong
- given: Hong
family: Wang
- given: Haoyang
family: Liu
- given: Jian
family: Luo
- given: Jie
family: Wang
editor:
- given: Ruslan
family: Salakhutdinov
- given: Zico
family: Kolter
- given: Katherine
family: Heller
- given: Adrian
family: Weller
- given: Nuria
family: Oliver
- given: Jonathan
family: Scarlett
- given: Felix
family: Berkenkamp
page: 11395-11411
id: dong24d
issued:
date-parts:
- 2024
- 7
- 8
firstpag