- title: 'PAC-Bayesian Bounds on Rate-Efficient Classifiers' abstract: 'We derive analytic bounds on the noise invariance of majority vote classifiers operating on compressed inputs. Specifically, starting from recent bounds on the true risk of majority vote classifiers, we extend the applicability of PAC-Bayesian theory to quantify the resilience of majority votes to input noise stemming from compression. The derived bounds are intuitive in binary classification settings, where they can be measured as expressions of voter differentials and voter pair agreement. By combining measures of input distortion with analytic guarantees on noise invariance, we prescribe rate-efficient machines to compress inputs without affecting subsequent classification. Our validation shows how bounding noise invariance can inform the compression stage for any majority vote classifier such that worst-case implications of bad input reconstructions are known, and inputs can be compressed to the minimum amount of information needed prior to inference.' volume: 162 URL: https://proceedings.mlr.press/v162/abbas22a.html PDF: https://proceedings.mlr.press/v162/abbas22a/abbas22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-abbas22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Alhabib family: Abbas - given: Yiannis family: Andreopoulos editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1-9 id: abbas22a issued: date-parts: - 2022 - 6 - 28 firstpage: 1 lastpage: 9 published: 2022-06-28 00:00:00 +0000 - title: 'Sharp-MAML: Sharpness-Aware Model-Agnostic Meta Learning' abstract: 'Model-agnostic meta learning (MAML) is currently one of the dominating approaches for few-shot meta-learning. Albeit its effectiveness, the optimization of MAML can be challenging due to the innate bilevel problem structure. Specifically, the loss landscape of MAML is much more complex with possibly more saddle points and local minimizers than its empirical risk minimization counterpart. To address this challenge, we leverage the recently invented sharpness-aware minimization and develop a sharpness-aware MAML approach that we term Sharp-MAML. We empirically demonstrate that Sharp-MAML and its computation-efficient variant can outperform the plain-vanilla MAML baseline (e.g., +3% accuracy on Mini-Imagenet). We complement the empirical study with the convergence rate analysis and the generalization bound of Sharp-MAML. To the best of our knowledge, this is the first empirical and theoretical study on sharpness-aware minimization in the context of bilevel learning.' volume: 162 URL: https://proceedings.mlr.press/v162/abbas22b.html PDF: https://proceedings.mlr.press/v162/abbas22b/abbas22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-abbas22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Momin family: Abbas - given: Quan family: Xiao - given: Lisha family: Chen - given: Pin-Yu family: Chen - given: Tianyi family: Chen editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10-32 id: abbas22b issued: date-parts: - 2022 - 6 - 28 firstpage: 10 lastpage: 32 published: 2022-06-28 00:00:00 +0000 - title: 'An Initial Alignment between Neural Network and Target is Needed for Gradient Descent to Learn' abstract: 'This paper introduces the notion of “Initial Alignment” (INAL) between a neural network at initialization and a target function. It is proved that if a network and a Boolean target function do not have a noticeable INAL, then noisy gradient descent with normalized i.i.d. initialization will not learn in polynomial time. Thus a certain amount of knowledge about the target (measured by the INAL) is needed in the architecture design. This also provides an answer to an open problem posed in (AS-NeurIPS’20). The results are based on deriving lower-bounds for descent algorithms on symmetric neural networks without explicit knowledge of the target function beyond its INAL.' volume: 162 URL: https://proceedings.mlr.press/v162/abbe22a.html PDF: https://proceedings.mlr.press/v162/abbe22a/abbe22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-abbe22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Emmanuel family: Abbe - given: Elisabetta family: Cornacchia - given: Jan family: Hazla - given: Christopher family: Marquis editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 33-52 id: abbe22a issued: date-parts: - 2022 - 6 - 28 firstpage: 33 lastpage: 52 published: 2022-06-28 00:00:00 +0000 - title: 'Active Sampling for Min-Max Fairness' abstract: 'We propose simple active sampling and reweighting strategies for optimizing min-max fairness that can be applied to any classification or regression model learned via loss minimization. The key intuition behind our approach is to use at each timestep a datapoint from the group that is worst off under the current model for updating the model. The ease of implementation and the generality of our robust formulation make it an attractive option for improving model performance on disadvantaged groups. For convex learning problems, such as linear or logistic regression, we provide a fine-grained analysis, proving the rate of convergence to a min-max fair solution.' volume: 162 URL: https://proceedings.mlr.press/v162/abernethy22a.html PDF: https://proceedings.mlr.press/v162/abernethy22a/abernethy22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-abernethy22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jacob D family: Abernethy - given: Pranjal family: Awasthi - given: Matthäus family: Kleindessner - given: Jamie family: Morgenstern - given: Chris family: Russell - given: Jie family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 53-65 id: abernethy22a issued: date-parts: - 2022 - 6 - 28 firstpage: 53 lastpage: 65 published: 2022-06-28 00:00:00 +0000 - title: 'Meaningfully debugging model mistakes using conceptual counterfactual explanations' abstract: 'Understanding and explaining the mistakes made by trained models is critical to many machine learning objectives, such as improving robustness, addressing concept drift, and mitigating biases. However, this is often an ad hoc process that involves manually looking at the model’s mistakes on many test samples and guessing at the underlying reasons for those incorrect predictions. In this paper, we propose a systematic approach, conceptual counterfactual explanations (CCE), that explains why a classifier makes a mistake on a particular test sample(s) in terms of human-understandable concepts (e.g. this zebra is misclassified as a dog because of faint stripes). We base CCE on two prior ideas: counterfactual explanations and concept activation vectors, and validate our approach on well-known pretrained models, showing that it explains the models’ mistakes meaningfully. In addition, for new models trained on data with spurious correlations, CCE accurately identifies the spurious correlation as the cause of model mistakes from a single misclassified test sample. On two challenging medical applications, CCE generated useful insights, confirmed by clinicians, into biases and mistakes the model makes in real-world settings. The code for CCE is publicly available and can easily be applied to explain mistakes in new models.' volume: 162 URL: https://proceedings.mlr.press/v162/abid22a.html PDF: https://proceedings.mlr.press/v162/abid22a/abid22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-abid22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Abubakar family: Abid - given: Mert family: Yuksekgonul - given: James family: Zou editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 66-88 id: abid22a issued: date-parts: - 2022 - 6 - 28 firstpage: 66 lastpage: 88 published: 2022-06-28 00:00:00 +0000 - title: 'Batched Dueling Bandits' abstract: 'The K-armed dueling bandit problem, where the feedback is in the form of noisy pairwise comparisons, has been widely studied. Previous works have only focused on the sequential setting where the policy adapts after every comparison. However, in many applications such as search ranking and recommendation systems, it is preferable to perform comparisons in a limited number of parallel batches. We study the batched K-armed dueling bandit problem under two standard settings: (i) existence of a Condorcet winner, and (ii) strong stochastic transitivity and stochastic triangle inequality. For both settings, we obtain algorithms with a smooth trade-off between the number of batches and regret. Our regret bounds match the best known sequential regret bounds (up to poly-logarithmic factors), using only a logarithmic number of batches. We complement our regret analysis with a nearly-matching lower bound. Finally, we also validate our theoretical results via experiments on synthetic and real data.' volume: 162 URL: https://proceedings.mlr.press/v162/agarwal22a.html PDF: https://proceedings.mlr.press/v162/agarwal22a/agarwal22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-agarwal22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Arpit family: Agarwal - given: Rohan family: Ghuge - given: Viswanath family: Nagarajan editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 89-110 id: agarwal22a issued: date-parts: - 2022 - 6 - 28 firstpage: 89 lastpage: 110 published: 2022-06-28 00:00:00 +0000 - title: 'Hierarchical Shrinkage: Improving the accuracy and interpretability of tree-based models.' abstract: 'Decision trees and random forests (RF) are a cornerstone of modern machine learning practice. Due to their tendency to overfit, trees are typically regularized by a variety of techniques that modify their structure (e.g. pruning). We introduce Hierarchical Shrinkage (HS), a post-hoc algorithm which regularizes the tree not by altering its structure, but by shrinking the prediction over each leaf toward the sample means over each of its ancestors, with weights depending on a single regularization parameter and the number of samples in each ancestor. Since HS is a post-hoc method, it is extremely fast, compatible with any tree-growing algorithm and can be used synergistically with other regularization techniques. Extensive experiments over a wide variety of real-world datasets show that HS substantially increases the predictive performance of decision trees even when used in conjunction with other regularization techniques. Moreover, we find that applying HS to individual trees in a RF often improves its accuracy and interpretability by simplifying and stabilizing decision boundaries and SHAP values. We further explain HS by showing that it to be equivalent to ridge regression on a basis that is constructed of decision stumps associated to the internal nodes of a tree. All code and models are released in a full-fledged package available on Github' volume: 162 URL: https://proceedings.mlr.press/v162/agarwal22b.html PDF: https://proceedings.mlr.press/v162/agarwal22b/agarwal22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-agarwal22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Abhineet family: Agarwal - given: Yan Shuo family: Tan - given: Omer family: Ronen - given: Chandan family: Singh - given: Bin family: Yu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 111-135 id: agarwal22b issued: date-parts: - 2022 - 6 - 28 firstpage: 111 lastpage: 135 published: 2022-06-28 00:00:00 +0000 - title: 'Deep equilibrium networks are sensitive to initialization statistics' abstract: 'Deep equilibrium networks (DEQs) are a promising way to construct models which trade off memory for compute. However, theoretical understanding of these models is still lacking compared to traditional networks, in part because of the repeated application of a single set of weights. We show that DEQs are sensitive to the higher order statistics of the matrix families from which they are initialized. In particular, initializing with orthogonal or symmetric matrices allows for greater stability in training. This gives us a practical prescription for initializations which allow for training with a broader range of initial weight scales.' volume: 162 URL: https://proceedings.mlr.press/v162/agarwala22a.html PDF: https://proceedings.mlr.press/v162/agarwala22a/agarwala22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-agarwala22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Atish family: Agarwala - given: Samuel S family: Schoenholz editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 136-160 id: agarwala22a issued: date-parts: - 2022 - 6 - 28 firstpage: 136 lastpage: 160 published: 2022-06-28 00:00:00 +0000 - title: 'Learning of Cluster-based Feature Importance for Electronic Health Record Time-series' abstract: 'The recent availability of Electronic Health Records (EHR) has allowed for the development of algorithms predicting inpatient risk of deterioration and trajectory evolution. However, prediction of disease progression with EHR is challenging since these data are sparse, heterogeneous, multi-dimensional, and multi-modal time-series. As such, clustering is regularly used to identify similar groups within the patient cohort to improve prediction. Current models have shown some success in obtaining cluster representations of patient trajectories. However, they i) fail to obtain clinical interpretability for each cluster, and ii) struggle to learn meaningful cluster numbers in the context of imbalanced distribution of disease outcomes. We propose a supervised deep learning model to cluster EHR data based on the identification of clinically understandable phenotypes with regard to both outcome prediction and patient trajectory. We introduce novel loss functions to address the problems of class imbalance and cluster collapse, and furthermore propose a feature-time attention mechanism to identify cluster-based phenotype importance across time and feature dimensions. We tested our model in two datasets corresponding to distinct medical settings. Our model yielded added interpretability to cluster formation and outperformed benchmarks by at least 4% in relevant metrics.' volume: 162 URL: https://proceedings.mlr.press/v162/aguiar22a.html PDF: https://proceedings.mlr.press/v162/aguiar22a/aguiar22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-aguiar22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Henrique family: Aguiar - given: Mauro family: Santos - given: Peter family: Watkinson - given: Tingting family: Zhu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 161-179 id: aguiar22a issued: date-parts: - 2022 - 6 - 28 firstpage: 161 lastpage: 179 published: 2022-06-28 00:00:00 +0000 - title: 'On the Convergence of the Shapley Value in Parametric Bayesian Learning Games' abstract: 'Measuring contributions is a classical problem in cooperative game theory where the Shapley value is the most well-known solution concept. In this paper, we establish the convergence property of the Shapley value in parametric Bayesian learning games where players perform a Bayesian inference using their combined data, and the posterior-prior KL divergence is used as the characteristic function. We show that for any two players, under some regularity conditions, their difference in Shapley value converges in probability to the difference in Shapley value of a limiting game whose characteristic function is proportional to the log-determinant of the joint Fisher information. As an application, we present an online collaborative learning framework that is asymptotically Shapley-fair. Our result enables this to be achieved without any costly computations of posterior-prior KL divergences. Only a consistent estimator of the Fisher information is needed. The effectiveness of our framework is demonstrated with experiments using real-world data.' volume: 162 URL: https://proceedings.mlr.press/v162/agussurja22a.html PDF: https://proceedings.mlr.press/v162/agussurja22a/agussurja22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-agussurja22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Lucas family: Agussurja - given: Xinyi family: Xu - given: Bryan Kian Hsiang family: Low editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 180-196 id: agussurja22a issued: date-parts: - 2022 - 6 - 28 firstpage: 180 lastpage: 196 published: 2022-06-28 00:00:00 +0000 - title: 'Individual Preference Stability for Clustering' abstract: 'In this paper, we propose a natural notion of individual preference (IP) stability for clustering, which asks that every data point, on average, is closer to the points in its own cluster than to the points in any other cluster. Our notion can be motivated from several perspectives, including game theory and algorithmic fairness. We study several questions related to our proposed notion. We first show that deciding whether a given data set allows for an IP-stable clustering in general is NP-hard. As a result, we explore the design of efficient algorithms for finding IP-stable clusterings in some restricted metric spaces. We present a polytime algorithm to find a clustering satisfying exact IP-stability on the real line, and an efficient algorithm to find an IP-stable 2-clustering for a tree metric. We also consider relaxing the stability constraint, i.e., every data point should not be too far from its own cluster compared to any other cluster. For this case, we provide polytime algorithms with different guarantees. We evaluate some of our algorithms and several standard clustering approaches on real data sets.' volume: 162 URL: https://proceedings.mlr.press/v162/ahmadi22a.html PDF: https://proceedings.mlr.press/v162/ahmadi22a/ahmadi22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ahmadi22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Saba family: Ahmadi - given: Pranjal family: Awasthi - given: Samir family: Khuller - given: Matthäus family: Kleindessner - given: Jamie family: Morgenstern - given: Pattara family: Sukprasert - given: Ali family: Vakilian editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 197-246 id: ahmadi22a issued: date-parts: - 2022 - 6 - 28 firstpage: 197 lastpage: 246 published: 2022-06-28 00:00:00 +0000 - title: 'Understanding the unstable convergence of gradient descent' abstract: 'Most existing analyses of (stochastic) gradient descent rely on the condition that for $L$-smooth costs, the step size is less than $2/L$. However, many works have observed that in machine learning applications step sizes often do not fulfill this condition, yet (stochastic) gradient descent still converges, albeit in an unstable manner. We investigate this unstable convergence phenomenon from first principles, and discuss key causes behind it. We also identify its main characteristics, and how they interrelate based on both theory and experiments, offering a principled view toward understanding the phenomenon.' volume: 162 URL: https://proceedings.mlr.press/v162/ahn22a.html PDF: https://proceedings.mlr.press/v162/ahn22a/ahn22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ahn22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Kwangjun family: Ahn - given: Jingzhao family: Zhang - given: Suvrit family: Sra editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 247-257 id: ahn22a issued: date-parts: - 2022 - 6 - 28 firstpage: 247 lastpage: 257 published: 2022-06-28 00:00:00 +0000 - title: 'Minimum Cost Intervention Design for Causal Effect Identification' abstract: 'Pearl’s do calculus is a complete axiomatic approach to learn the identifiable causal effects from observational data. When such an effect is not identifiable, it is necessary to perform a collection of often costly interventions in the system to learn the causal effect. In this work, we consider the problem of designing the collection of interventions with the minimum cost to identify the desired effect. First, we prove that this prob-em is NP-complete, and subsequently propose an algorithm that can either find the optimal solution or a logarithmic-factor approximation of it. This is done by establishing a connection between our problem and the minimum hitting set problem. Additionally, we propose several polynomial time heuristic algorithms to tackle the computational complexity of the problem. Although these algorithms could potentially stumble on sub-optimal solutions, our simulations show that they achieve small regrets on random graphs.' volume: 162 URL: https://proceedings.mlr.press/v162/akbari22a.html PDF: https://proceedings.mlr.press/v162/akbari22a/akbari22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-akbari22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sina family: Akbari - given: Jalal family: Etesami - given: Negar family: Kiyavash editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 258-289 id: akbari22a issued: date-parts: - 2022 - 6 - 28 firstpage: 258 lastpage: 289 published: 2022-06-28 00:00:00 +0000 - title: 'How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating and Auditing Generative Models' abstract: 'Devising domain- and model-agnostic evaluation metrics for generative models is an important and as yet unresolved problem. Most existing metrics, which were tailored solely to the image synthesis setup, exhibit a limited capacity for diagnosing the different modes of failure of generative models across broader application domains. In this paper, we introduce a 3-dimensional evaluation metric, ($\alpha$-Precision, $\beta$-Recall, Authenticity), that characterizes the fidelity, diversity and generalization performance of any generative model in a domain-agnostic fashion. Our metric unifies statistical divergence measures with precision-recall analysis, enabling sample- and distribution-level diagnoses of model fidelity and diversity. We introduce generalization as an additional, independent dimension (to the fidelity-diversity trade-off) that quantifies the extent to which a model copies training data{—}a crucial performance indicator when modeling sensitive data with requirements on privacy. The three metric components correspond to (interpretable) probabilistic quantities, and are estimated via sample-level binary classification. The sample-level nature of our metric inspires a novel use case which we call model auditing, wherein we judge the quality of individual samples generated by a (black-box) model, discarding low-quality samples and hence improving the overall model performance in a post-hoc manner.' volume: 162 URL: https://proceedings.mlr.press/v162/alaa22a.html PDF: https://proceedings.mlr.press/v162/alaa22a/alaa22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-alaa22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ahmed family: Alaa - given: Boris family: Van Breugel - given: Evgeny S. family: Saveliev - given: Mihaela prefix: van der family: Schaar editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 290-306 id: alaa22a issued: date-parts: - 2022 - 6 - 28 firstpage: 290 lastpage: 306 published: 2022-06-28 00:00:00 +0000 - title: 'A Natural Actor-Critic Framework for Zero-Sum Markov Games' abstract: 'We introduce algorithms based on natural actor-critic and analyze their sample complexity for solving two player zero-sum Markov games in the tabular case. Our results improve the best-known sample complexities of policy gradient/actor-critic methods for convergence to Nash equilibrium in the multi-agent setting. We use the error propagation scheme in approximate dynamic programming, recent advances for global convergence of policy gradient methods, temporal difference learning, and techniques from stochastic primal-dual optimization. Our algorithms feature two stages, requiring agents to agree on an etiquette before starting their interactions, which is feasible for instance in self-play. However, the agents only access to joint reward and joint next state and not to each other’s actions or policies. Our complexity results match the best-known results for global convergence of policy gradient algorithms for single agent RL. We provide numerical verification of our methods for a two player bandit environment and a two player game, Alesia. We observe improved empirical performance as compared to the recently proposed optimistic gradient descent-ascent variant for Markov games.' volume: 162 URL: https://proceedings.mlr.press/v162/alacaoglu22a.html PDF: https://proceedings.mlr.press/v162/alacaoglu22a/alacaoglu22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-alacaoglu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ahmet family: Alacaoglu - given: Luca family: Viano - given: Niao family: He - given: Volkan family: Cevher editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 307-366 id: alacaoglu22a issued: date-parts: - 2022 - 6 - 28 firstpage: 307 lastpage: 366 published: 2022-06-28 00:00:00 +0000 - title: 'Deploying Convolutional Networks on Untrusted Platforms Using 2D Holographic Reduced Representations' abstract: 'Due to the computational cost of running inference for a neural network, the need to deploy the inferential steps on a third party’s compute environment or hardware is common. If the third party is not fully trusted, it is desirable to obfuscate the nature of the inputs and outputs, so that the third party can not easily determine what specific task is being performed. Provably secure protocols for leveraging an untrusted party exist but are too computational demanding to run in practice. We instead explore a different strategy of fast, heuristic security that we call Connectionist Symbolic Pseudo Secrets. By leveraging Holographic Reduced Representations (HRRs), we create a neural network with a pseudo-encryption style defense that empirically shows robustness to attack, even under threat models that unrealistically favor the adversary.' volume: 162 URL: https://proceedings.mlr.press/v162/alam22a.html PDF: https://proceedings.mlr.press/v162/alam22a/alam22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-alam22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mohammad Mahmudul family: Alam - given: Edward family: Raff - given: Tim family: Oates - given: James family: Holt editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 367-393 id: alam22a issued: date-parts: - 2022 - 6 - 28 firstpage: 367 lastpage: 393 published: 2022-06-28 00:00:00 +0000 - title: 'Optimistic Linear Support and Successor Features as a Basis for Optimal Policy Transfer' abstract: 'In many real-world applications, reinforcement learning (RL) agents might have to solve multiple tasks, each one typically modeled via a reward function. If reward functions are expressed linearly, and the agent has previously learned a set of policies for different tasks, successor features (SFs) can be exploited to combine such policies and identify reasonable solutions for new problems. However, the identified solutions are not guaranteed to be optimal. We introduce a novel algorithm that addresses this limitation. It allows RL agents to combine existing policies and directly identify optimal policies for arbitrary new problems, without requiring any further interactions with the environment. We first show (under mild assumptions) that the transfer learning problem tackled by SFs is equivalent to the problem of learning to optimize multiple objectives in RL. We then introduce an SF-based extension of the Optimistic Linear Support algorithm to learn a set of policies whose SFs form a convex coverage set. We prove that policies in this set can be combined via generalized policy improvement to construct optimal behaviors for any new linearly-expressible tasks, without requiring any additional training samples. We empirically show that our method outperforms state-of-the-art competing algorithms both in discrete and continuous domains under value function approximation.' volume: 162 URL: https://proceedings.mlr.press/v162/alegre22a.html PDF: https://proceedings.mlr.press/v162/alegre22a/alegre22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-alegre22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Lucas Nunes family: Alegre - given: Ana family: Bazzan - given: Bruno C. Da family: Silva editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 394-413 id: alegre22a issued: date-parts: - 2022 - 6 - 28 firstpage: 394 lastpage: 413 published: 2022-06-28 00:00:00 +0000 - title: 'Structured Stochastic Gradient MCMC' abstract: 'Stochastic gradient Markov Chain Monte Carlo (SGMCMC) is a scalable algorithm for asymptotically exact Bayesian inference in parameter-rich models, such as Bayesian neural networks. However, since mixing can be slow in high dimensions, practitioners often resort to variational inference (VI). Unfortunately, VI makes strong assumptions on both the factorization and functional form of the posterior. To relax these assumptions, this work proposes a new non-parametric variational inference scheme that combines ideas from both SGMCMC and coordinate-ascent VI. The approach relies on a new Langevin-type algorithm that operates on a "self-averaged" posterior energy function, where parts of the latent variables are averaged over samples from earlier iterations of the Markov chain. This way, statistical dependencies between coordinates can be broken in a controlled way, allowing the chain to mix faster. This scheme can be further modified in a "dropout" manner, leading to even more scalability. We test our scheme for ResNet-20 on CIFAR-10, SVHN, and FMNIST. In all cases, we find improvements in convergence speed and/or final accuracy compared to SGMCMC and parametric VI.' volume: 162 URL: https://proceedings.mlr.press/v162/alexos22a.html PDF: https://proceedings.mlr.press/v162/alexos22a/alexos22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-alexos22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Antonios family: Alexos - given: Alex J family: Boyd - given: Stephan family: Mandt editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 414-434 id: alexos22a issued: date-parts: - 2022 - 6 - 28 firstpage: 414 lastpage: 434 published: 2022-06-28 00:00:00 +0000 - title: 'XAI for Transformers: Better Explanations through Conservative Propagation' abstract: 'Transformers have become an important workhorse of machine learning, with numerous applications. This necessitates the development of reliable methods for increasing their transparency. Multiple interpretability methods, often based on gradient information, have been proposed. We show that the gradient in a Transformer reflects the function only locally, and thus fails to reliably identify the contribution of input features to the prediction. We identify Attention Heads and LayerNorm as main reasons for such unreliable explanations and propose a more stable way for propagation through these layers. Our proposal, which can be seen as a proper extension of the well-established LRP method to Transformers, is shown both theoretically and empirically to overcome the deficiency of a simple gradient-based approach, and achieves state-of-the-art explanation performance on a broad range of Transformer models and datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/ali22a.html PDF: https://proceedings.mlr.press/v162/ali22a/ali22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ali22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ameen family: Ali - given: Thomas family: Schnake - given: Oliver family: Eberle - given: Grégoire family: Montavon - given: Klaus-Robert family: Müller - given: Lior family: Wolf editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 435-451 id: ali22a issued: date-parts: - 2022 - 6 - 28 firstpage: 435 lastpage: 451 published: 2022-06-28 00:00:00 +0000 - title: 'RUMs from Head-to-Head Contests' abstract: 'Random utility models (RUMs) encode the likelihood that a particular item will be selected from a slate of competing items. RUMs are well-studied objects in both discrete choice theory and, more recently, in the machine learning community, as they encode a fairly broad notion of rational user behavior. In this paper, we focus on slates of size two representing head-to-head contests. Given a tournament matrix $M$ such that $M_{i,j}$ is the probability that item $j$ will be selected from $\{i, j\}$, we consider the problem of finding the RUM that most closely reproduces $M$. For this problem we obtain a polynomial-time algorithm returning a RUM that approximately minimizes the average error over the pairs. Our experiments show that RUMs can perfectly represent many of the tournament matrices that have been considered in the literature; in fact, the maximum average error induced by RUMs on the matrices we considered is negligible ($\approx 0.001$). We also show that RUMs are competitive, on prediction tasks, with previous approaches.' volume: 162 URL: https://proceedings.mlr.press/v162/almanza22a.html PDF: https://proceedings.mlr.press/v162/almanza22a/almanza22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-almanza22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Matteo family: Almanza - given: Flavio family: Chierichetti - given: Ravi family: Kumar - given: Alessandro family: Panconesi - given: Andrew family: Tomkins editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 452-467 id: almanza22a issued: date-parts: - 2022 - 6 - 28 firstpage: 452 lastpage: 467 published: 2022-06-28 00:00:00 +0000 - title: 'Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval' abstract: 'Retrieval-based language models (R-LM) model the probability of natural language text by combining a standard language model (LM) with examples retrieved from an external datastore at test time. While effective, a major bottleneck of using these models in practice is the computationally costly datastore search, which can be performed as frequently as every time step. In this paper, we present RetoMaton - retrieval automaton - which approximates the datastore search, based on (1) saving pointers between consecutive datastore entries, and (2) clustering of entries into "states". This effectively results in a weighted finite automaton built on top of the datastore, instead of representing the datastore as a flat list. The creation of the automaton is unsupervised, and a RetoMaton can be constructed from any text collection: either the original training corpus or from another domain. Traversing this automaton at inference time, in parallel to the LM inference, reduces its perplexity by up to 1.85, or alternatively saves up to 83% of the nearest neighbor searches over $k$NN-LM (Khandelwal et al., 2020) without hurting perplexity. Our code and trained models are available at https://github.com/neulab/retomaton .' volume: 162 URL: https://proceedings.mlr.press/v162/alon22a.html PDF: https://proceedings.mlr.press/v162/alon22a/alon22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-alon22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Uri family: Alon - given: Frank family: Xu - given: Junxian family: He - given: Sudipta family: Sengupta - given: Dan family: Roth - given: Graham family: Neubig editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 468-485 id: alon22a issued: date-parts: - 2022 - 6 - 28 firstpage: 468 lastpage: 485 published: 2022-06-28 00:00:00 +0000 - title: 'Minimax Classification under Concept Drift with Multidimensional Adaptation and Performance Guarantees' abstract: 'The statistical characteristics of instance-label pairs often change with time in practical scenarios of supervised classification. Conventional learning techniques adapt to such concept drift accounting for a scalar rate of change by means of a carefully chosen learning rate, forgetting factor, or window size. However, the time changes in common scenarios are multidimensional, i.e., different statistical characteristics often change in a different manner. This paper presents adaptive minimax risk classifiers (AMRCs) that account for multidimensional time changes by means of a multivariate and high-order tracking of the time-varying underlying distribution. In addition, differently from conventional techniques, AMRCs can provide computable tight performance guarantees. Experiments on multiple benchmark datasets show the classification improvement of AMRCs compared to the state-of-the-art and the reliability of the presented performance guarantees.' volume: 162 URL: https://proceedings.mlr.press/v162/alvarez22a.html PDF: https://proceedings.mlr.press/v162/alvarez22a/alvarez22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-alvarez22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Verónica family: Álvarez - given: Santiago family: Mazuelas - given: Jose A family: Lozano editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 486-499 id: alvarez22a issued: date-parts: - 2022 - 6 - 28 firstpage: 486 lastpage: 499 published: 2022-06-28 00:00:00 +0000 - title: 'Scalable First-Order Bayesian Optimization via Structured Automatic Differentiation' abstract: 'Bayesian Optimization (BO) has shown great promise for the global optimization of functions that are expensive to evaluate, but despite many successes, standard approaches can struggle in high dimensions. To improve the performance of BO, prior work suggested incorporating gradient information into a Gaussian process surrogate of the objective, giving rise to kernel matrices of size $nd$ {\texttimes} $nd$ for $n$ observations in $d$ dimensions. Naı̈vely multiplying with (resp. inverting) these matrices requires $O(n^2d^2)$ (resp. $O(n^3d^3)$) operations, which becomes infeasible for moderate dimensions and sample sizes. Here, we observe that a wide range of kernels gives rise to structured matrices, enabling an exact $O(n^2d)$ matrix-vector multiply for gradient observations and $O(n^2d^2)$ for Hessian observations. Beyond canonical kernel classes, we derive a programmatic approach to leveraging this type of structure for transformations and combinations of the discussed kernel classes, which constitutes a structure-aware automatic differentiation algorithm. Our methods apply to virtually all canonical kernels and automatically extend to complex kernels, like the neural network, radial basis function network, and spectral mixture kernels without any additional derivations, enabling flexible, problem-dependent modeling while scaling first-order BO to high $d$.' volume: 162 URL: https://proceedings.mlr.press/v162/ament22a.html PDF: https://proceedings.mlr.press/v162/ament22a/ament22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ament22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sebastian E family: Ament - given: Carla P family: Gomes editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 500-516 id: ament22a issued: date-parts: - 2022 - 6 - 28 firstpage: 500 lastpage: 516 published: 2022-06-28 00:00:00 +0000 - title: 'Public Data-Assisted Mirror Descent for Private Model Training' abstract: 'In this paper, we revisit the problem of using in-distribution public data to improve the privacy/utility trade-offs for differentially private (DP) model training. (Here, public data refers to auxiliary data sets that have no privacy concerns.) We design a natural variant of DP mirror descent, where the DP gradients of the private/sensitive data act as the linear term, and the loss generated by the public data as the mirror map. We show that, for linear regression with feature vectors drawn from a non-isotropic sub-Gaussian distribution, our algorithm, PDA-DPMD (a variant of mirror descent), provides population risk guarantees that are asymptotically better than the best known guarantees under DP (without having access to public data), when the number of public data samples is sufficiently large. We further show that our algorithm has natural “noise stability” properties that control the variance due to noise added to ensure DP. We demonstrate the efficacy of our algorithm by showing privacy/utility trade-offs on four benchmark datasets (StackOverflow, WikiText-2, CIFAR-10, and EMNIST). We show that our algorithm not only significantly improves over traditional DP-SGD, which does not have access to public data, but to our knowledge is the first to improve over DP-SGD on models that have been pre-trained with public data.' volume: 162 URL: https://proceedings.mlr.press/v162/amid22a.html PDF: https://proceedings.mlr.press/v162/amid22a/amid22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-amid22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ehsan family: Amid - given: Arun family: Ganesh - given: Rajiv family: Mathews - given: Swaroop family: Ramaswamy - given: Shuang family: Song - given: Thomas family: Steinke - given: Thomas family: Steinke - given: Vinith M family: Suriyakumar - given: Om family: Thakkar - given: Abhradeep family: Thakurta editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 517-535 id: amid22a issued: date-parts: - 2022 - 6 - 28 firstpage: 517 lastpage: 535 published: 2022-06-28 00:00:00 +0000 - title: 'On Last-Iterate Convergence Beyond Zero-Sum Games' abstract: 'Most existing results about last-iterate convergence of learning dynamics are limited to two-player zero-sum games, and only apply under rigid assumptions about what dynamics the players follow. In this paper we provide new results and techniques that apply to broader families of games and learning dynamics. First, we show that in a class of games that includes constant-sum polymatrix and strategically zero-sum games, the trajectories of dynamics such as optimistic mirror descent (OMD) exhibit a boundedness property, which holds even when players employ different algorithms and prediction mechanisms. This property enables us to obtain $O(1/\sqrt{T})$ rates and optimal $O(1)$ regret bounds. Our analysis also reveals a surprising property: OMD either reaches arbitrarily close to a Nash equilibrium or it outperforms the robust price of anarchy in efficiency. Moreover, for potential games we establish convergence to an $\epsilon$-equilibrium after $O(1/\epsilon^2)$ iterations for mirror descent under a broad class of regularizers, as well as optimal $O(1)$ regret bounds for OMD variants. Our framework also extends to near-potential games, and unifies known analyses for distributed learning in Fisher’s market model. Finally, we analyze the convergence, efficiency, and robustness of optimistic gradient descent (OGD) in general-sum continuous games.' volume: 162 URL: https://proceedings.mlr.press/v162/anagnostides22a.html PDF: https://proceedings.mlr.press/v162/anagnostides22a/anagnostides22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-anagnostides22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ioannis family: Anagnostides - given: Ioannis family: Panageas - given: Gabriele family: Farina - given: Tuomas family: Sandholm editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 536-581 id: anagnostides22a issued: date-parts: - 2022 - 6 - 28 firstpage: 536 lastpage: 581 published: 2022-06-28 00:00:00 +0000 - title: 'Online Algorithms with Multiple Predictions' abstract: 'This paper studies online algorithms augmented with multiple machine-learned predictions. We give a generic algorithmic framework for online covering problems with multiple predictions that obtains an online solution that is competitive against the performance of the best solution obtained from the predictions. Our algorithm incorporates the use of predictions in the classic potential-based analysis of online algorithms. We apply our algorithmic framework to solve classical problems such as online set cover, (weighted) caching, and online facility location in the multiple predictions setting.' volume: 162 URL: https://proceedings.mlr.press/v162/anand22a.html PDF: https://proceedings.mlr.press/v162/anand22a/anand22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-anand22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Keerti family: Anand - given: Rong family: Ge - given: Amit family: Kumar - given: Debmalya family: Panigrahi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 582-598 id: anand22a issued: date-parts: - 2022 - 6 - 28 firstpage: 582 lastpage: 598 published: 2022-06-28 00:00:00 +0000 - title: 'Learning to Hash Robustly, Guaranteed' abstract: 'The indexing algorithms for the high-dimensional nearest neighbor search (NNS) with the best worst-case guarantees are based on the randomized Locality Sensitive Hashing (LSH), and its derivatives. In practice, many heuristic approaches exist to "learn" the best indexing method in order to speed-up NNS, crucially adapting to the structure of the given dataset. Oftentimes, these heuristics outperform the LSH-based algorithms on real datasets, but, almost always, come at the cost of losing the guarantees of either correctness or robust performance on adversarial queries, or apply to datasets with an assumed extra structure/model. In this paper, we design an NNS algorithm for the Hamming space that has worst-case guarantees essentially matching that of theoretical algorithms, while optimizing the hashing to the structure of the dataset (think instance-optimal algorithms) for performance on the minimum-performing query. We evaluate the algorithm’s ability to optimize for a given dataset both theoretically and practically. On the theoretical side, we exhibit a natural setting (dataset model) where our algorithm is much better than the standard theoretical one. On the practical side, we run experiments that show that our algorithm has a 1.8x and 2.1x better recall on the worst-performing queries to the MNIST and ImageNet datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/andoni22a.html PDF: https://proceedings.mlr.press/v162/andoni22a/andoni22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-andoni22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Alexandr family: Andoni - given: Daniel family: Beaglehole editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 599-618 id: andoni22a issued: date-parts: - 2022 - 6 - 28 firstpage: 599 lastpage: 618 published: 2022-06-28 00:00:00 +0000 - title: 'Set Based Stochastic Subsampling' abstract: 'Deep models are designed to operate on huge volumes of high dimensional data such as images. In order to reduce the volume of data these models must process, we propose a set-based two-stage end-to-end neural subsampling model that is jointly optimized with an arbitrary downstream task network (e.g. classifier). In the first stage, we efficiently subsample candidate elements using conditionally independent Bernoulli random variables by capturing coarse grained global information using set encoding functions, followed by conditionally dependent autoregressive subsampling of the candidate elements using Categorical random variables by modeling pair-wise interactions using set attention networks in the second stage. We apply our method to feature and instance selection and show that it outperforms the relevant baselines under low subsampling rates on a variety of tasks including image classification, image reconstruction, function reconstruction and few-shot classification. Additionally, for nonparametric models such as Neural Processes that require to leverage the whole training data at inference time, we show that our method enhances the scalability of these models.' volume: 162 URL: https://proceedings.mlr.press/v162/andreis22a.html PDF: https://proceedings.mlr.press/v162/andreis22a/andreis22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-andreis22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Bruno family: Andreis - given: Seanie family: Lee - given: A. Tuan family: Nguyen - given: Juho family: Lee - given: Eunho family: Yang - given: Sung Ju family: Hwang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 619-638 id: andreis22a issued: date-parts: - 2022 - 6 - 28 firstpage: 619 lastpage: 638 published: 2022-06-28 00:00:00 +0000 - title: 'Towards Understanding Sharpness-Aware Minimization' abstract: 'Sharpness-Aware Minimization (SAM) is a recent training method that relies on worst-case weight perturbations which significantly improves generalization in various settings. We argue that the existing justifications for the success of SAM which are based on a PAC-Bayes generalization bound and the idea of convergence to flat minima are incomplete. Moreover, there are no explanations for the success of using m-sharpness in SAM which has been shown as essential for generalization. To better understand this aspect of SAM, we theoretically analyze its implicit bias for diagonal linear networks. We prove that SAM always chooses a solution that enjoys better generalization properties than standard gradient descent for a certain class of problems, and this effect is amplified by using m-sharpness. We further study the properties of the implicit bias on non-linear networks empirically, where we show that fine-tuning a standard model with SAM can lead to significant generalization improvements. Finally, we provide convergence results of SAM for non-convex objectives when used with stochastic gradients. We illustrate these results empirically for deep networks and discuss their relation to the generalization behavior of SAM. The code of our experiments is available at https://github.com/tml-epfl/understanding-sam.' volume: 162 URL: https://proceedings.mlr.press/v162/andriushchenko22a.html PDF: https://proceedings.mlr.press/v162/andriushchenko22a/andriushchenko22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-andriushchenko22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Maksym family: Andriushchenko - given: Nicolas family: Flammarion editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 639-668 id: andriushchenko22a issued: date-parts: - 2022 - 6 - 28 firstpage: 639 lastpage: 668 published: 2022-06-28 00:00:00 +0000 - title: 'Fair and Fast k-Center Clustering for Data Summarization' abstract: 'We consider two key issues faced by many clustering methods when used for data summarization, namely (a) an unfair representation of "demographic groups” and (b) distorted summarizations, where data points in the summary represent subsets of the original data of vastly different sizes. Previous work made important steps towards handling separately each of these two issues in the context of the fundamental k-Center clustering objective through the study of fast algorithms for natural models that address them. We show that it is possible to effectively address both (a) and (b) simultaneously by presenting a clustering procedure that works for a canonical combined model and (i) is fast, both in theory and practice, (ii) exhibits a worst-case constant-factor guarantee, and (iii) gives promising computational results showing that there can be significant benefits in addressing both issues together instead of sequentially.' volume: 162 URL: https://proceedings.mlr.press/v162/angelidakis22a.html PDF: https://proceedings.mlr.press/v162/angelidakis22a/angelidakis22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-angelidakis22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Haris family: Angelidakis - given: Adam family: Kurpisz - given: Leon family: Sering - given: Rico family: Zenklusen editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 669-702 id: angelidakis22a issued: date-parts: - 2022 - 6 - 28 firstpage: 669 lastpage: 702 published: 2022-06-28 00:00:00 +0000 - title: 'Interactive Correlation Clustering with Existential Cluster Constraints' abstract: 'We consider the problem of clustering with user feedback. Existing methods express constraints about the input data points, most commonly through must-link and cannot-link constraints on data point pairs. In this paper, we introduce existential cluster constraints: a new form of feedback where users indicate the features of desired clusters. Specifically, users make statements about the existence of a cluster having (and not having) particular features. Our approach has multiple advantages: (1) constraints on clusters can express user intent more efficiently than point pairs; (2) in cases where the users’ mental model is of the desired clusters, it is more natural for users to express cluster-wise preferences; (3) it functions even when privacy restrictions prohibit users from seeing raw data. In addition to introducing existential cluster constraints, we provide an inference algorithm for incorporating our constraints into the output clustering. Finally, we demonstrate empirically that our proposed framework facilitates more accurate clustering with dramatically fewer user feedback inputs.' volume: 162 URL: https://proceedings.mlr.press/v162/angell22a.html PDF: https://proceedings.mlr.press/v162/angell22a/angell22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-angell22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Rico family: Angell - given: Nicholas family: Monath - given: Nishant family: Yadav - given: Andrew family: Mccallum editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 703-716 id: angell22a issued: date-parts: - 2022 - 6 - 28 firstpage: 703 lastpage: 716 published: 2022-06-28 00:00:00 +0000 - title: 'Image-to-Image Regression with Distribution-Free Uncertainty Quantification and Applications in Imaging' abstract: 'Image-to-image regression is an important learning task, used frequently in biological imaging. Current algorithms, however, do not generally offer statistical guarantees that protect against a model’s mistakes and hallucinations. To address this, we develop uncertainty quantification techniques with rigorous statistical guarantees for image-to-image regression problems. In particular, we show how to derive uncertainty intervals around each pixel that are guaranteed to contain the true value with a user-specified confidence probability. Our methods work in conjunction with any base machine learning model, such as a neural network, and endow it with formal mathematical guarantees{—}regardless of the true unknown data distribution or choice of model. Furthermore, they are simple to implement and computationally inexpensive. We evaluate our procedure on three image-to-image regression tasks: quantitative phase microscopy, accelerated magnetic resonance imaging, and super-resolution transmission electron microscopy of a Drosophila melanogaster brain.' volume: 162 URL: https://proceedings.mlr.press/v162/angelopoulos22a.html PDF: https://proceedings.mlr.press/v162/angelopoulos22a/angelopoulos22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-angelopoulos22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Anastasios N family: Angelopoulos - given: Amit Pal family: Kohli - given: Stephen family: Bates - given: Michael family: Jordan - given: Jitendra family: Malik - given: Thayer family: Alshaabi - given: Srigokul family: Upadhyayula - given: Yaniv family: Romano editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 717-730 id: angelopoulos22a issued: date-parts: - 2022 - 6 - 28 firstpage: 717 lastpage: 730 published: 2022-06-28 00:00:00 +0000 - title: 'AdaGrad Avoids Saddle Points' abstract: 'Adaptive first-order methods in optimization have widespread ML applications due to their ability to adapt to non-convex landscapes. However, their convergence guarantees are typically stated in terms of vanishing gradient norms, which leaves open the issue of converging to undesirable saddle points (or even local maxima). In this paper, we focus on the AdaGrad family of algorithms - from scalar to full-matrix preconditioning - and we examine the question of whether the method’s trajectories avoid saddle points. A major challenge that arises here is that AdaGrad’s step-size (or, more accurately, the method’s preconditioner) evolves over time in a filtration-dependent way, i.e., as a function of all gradients observed in earlier iterations; as a result, avoidance results for methods with a constant or vanishing step-size do not apply. We resolve this challenge by combining a series of step-size stabilization arguments with a recursive representation of the AdaGrad preconditioner that allows us to employ center-stable techniques and ultimately show that the induced trajectories avoid saddle points from almost any initial condition.' volume: 162 URL: https://proceedings.mlr.press/v162/antonakopoulos22a.html PDF: https://proceedings.mlr.press/v162/antonakopoulos22a/antonakopoulos22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-antonakopoulos22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Kimon family: Antonakopoulos - given: Panayotis family: Mertikopoulos - given: Georgios family: Piliouras - given: Xiao family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 731-771 id: antonakopoulos22a issued: date-parts: - 2022 - 6 - 28 firstpage: 731 lastpage: 771 published: 2022-06-28 00:00:00 +0000 - title: 'UnderGrad: A Universal Black-Box Optimization Method with Almost Dimension-Free Convergence Rate Guarantees' abstract: 'Universal methods achieve optimal convergence rate guarantees in convex optimization without any prior knowledge of the problem’s regularity parameters or the attributes of the gradient oracle employed by the method. In this regard, existing state-of-the-art algorithms achieve an $O(1/T^2)$ convergence rate in Lipschitz smooth problems with a perfect gradient oracle, and an $O(1/sqrt{T})$ convergence speed when the underlying problem is non-smooth and/or the gradient oracle is stochastic. On the downside, these methods do not take into account the dependence of these guarantees on the problem’s dimensionality, and this can have a catastrophic impact on a method’s convergence, in both theory and practice. Our paper aims to bridge this gap by providing a scalable universal method - dubbed UnDERGrad - which enjoys an almost dimension-free oracle complexity in problems with a favorable geometry (like the simplex, $\ell_1$-ball or trace-constraints), while retaining the order-optimal dependence on T described above. These "best of both worlds" guarantees are achieved via a primal-dual update scheme inspired by the dual exploration method for variational inequalities.' volume: 162 URL: https://proceedings.mlr.press/v162/antonakopoulos22b.html PDF: https://proceedings.mlr.press/v162/antonakopoulos22b/antonakopoulos22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-antonakopoulos22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Kimon family: Antonakopoulos - given: Dong Quan family: Vu - given: Volkan family: Cevher - given: Kfir family: Levy - given: Panayotis family: Mertikopoulos editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 772-795 id: antonakopoulos22b issued: date-parts: - 2022 - 6 - 28 firstpage: 772 lastpage: 795 published: 2022-06-28 00:00:00 +0000 - title: 'Adapting the Linearised Laplace Model Evidence for Modern Deep Learning' abstract: 'The linearised Laplace method for estimating model uncertainty has received renewed attention in the Bayesian deep learning community. The method provides reliable error bars and admits a closed-form expression for the model evidence, allowing for scalable selection of model hyperparameters. In this work, we examine the assumptions behind this method, particularly in conjunction with model selection. We show that these interact poorly with some now-standard tools of deep learning–stochastic approximation methods and normalisation layers–and make recommendations for how to better adapt this classic method to the modern setting. We provide theoretical support for our recommendations and validate them empirically on MLPs, classic CNNs, residual networks with and without normalisation layers, generative autoencoders and transformers.' volume: 162 URL: https://proceedings.mlr.press/v162/antoran22a.html PDF: https://proceedings.mlr.press/v162/antoran22a/antoran22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-antoran22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Javier family: Antoran - given: David family: Janz - given: James U family: Allingham - given: Erik family: Daxberger - given: Riccardo Rb family: Barbano - given: Eric family: Nalisnick - given: Jose Miguel family: Hernandez-Lobato editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 796-821 id: antoran22a issued: date-parts: - 2022 - 6 - 28 firstpage: 796 lastpage: 821 published: 2022-06-28 00:00:00 +0000 - title: 'EAT-C: Environment-Adversarial sub-Task Curriculum for Efficient Reinforcement Learning' abstract: 'Reinforcement learning (RL) is inefficient on long-horizon tasks due to sparse rewards and its policy can be fragile to slightly perturbed environments. We address these challenges via a curriculum of tasks with coupled environments, generated by two policies trained jointly with RL: (1) a co-operative planning policy recursively decomposing a hard task into a coarse-to-fine sub-task tree; and (2) an adversarial policy modifying the environment in each sub-task. They are complementary to acquire more informative feedback for RL: (1) provides dense reward of easier sub-tasks while (2) modifies sub-tasks’ environments to be more challenging and diverse. Conversely, they are trained by RL’s dense feedback on sub-tasks so their generated curriculum keeps adaptive to RL’s progress. The sub-task tree enables an easy-to-hard curriculum for every policy: its top-down construction gradually increases sub-tasks the planner needs to generate, while the adversarial training between the environment and RL follows a bottom-up traversal that starts from a dense sequence of easier sub-tasks allowing more frequent environment changes. We compare EAT-C with RL/planning targeting similar problems and methods with environment generators or adversarial agents. Extensive experiments on diverse tasks demonstrate the advantages of our method on improving RL’s efficiency and generalization.' volume: 162 URL: https://proceedings.mlr.press/v162/ao22a.html PDF: https://proceedings.mlr.press/v162/ao22a/ao22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ao22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Shuang family: Ao - given: Tianyi family: Zhou - given: Jing family: Jiang - given: Guodong family: Long - given: Xuan family: Song - given: Chengqi family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 822-843 id: ao22a issued: date-parts: - 2022 - 6 - 28 firstpage: 822 lastpage: 843 published: 2022-06-28 00:00:00 +0000 - title: 'Online Balanced Experimental Design' abstract: 'We consider the experimental design problem in an online environment, an important practical task for reducing the variance of estimates in randomized experiments which allows for greater precision, and in turn, improved decision making. In this work, we present algorithms that build on recent advances in online discrepancy minimization which accommodate both arbitrary treatment probabilities and multiple treatments. The proposed algorithms are computational efficient, minimize covariate imbalance, and include randomization which enables robustness to misspecification. We provide worst case bounds on the expected mean squared error of the causal estimate and show that the proposed estimator is no worse than an implicit ridge regression, which are within a logarithmic factor of the best known results for offline experimental design. We conclude with a detailed simulation study showing favorable results relative to complete randomization as well as to offline methods for experimental design with time complexities exceeding our algorithm, which has a linear dependence on the number of observations, by polynomial factors.' volume: 162 URL: https://proceedings.mlr.press/v162/arbour22a.html PDF: https://proceedings.mlr.press/v162/arbour22a/arbour22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-arbour22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: David family: Arbour - given: Drew family: Dimmery - given: Tung family: Mai - given: Anup family: Rao editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 844-864 id: arbour22a issued: date-parts: - 2022 - 6 - 28 firstpage: 844 lastpage: 864 published: 2022-06-28 00:00:00 +0000 - title: 'VariGrow: Variational Architecture Growing for Task-Agnostic Continual Learning based on Bayesian Novelty' abstract: 'Continual Learning (CL) is the problem of sequentially learning a set of tasks and preserving all the knowledge acquired. Many existing methods assume that the data stream is explicitly divided into a sequence of known contexts (tasks), and use this information to know when to transfer knowledge from one context to another. Unfortunately, many real-world CL scenarios have no clear task nor context boundaries, motivating the study of task-agnostic CL, where neither the specific tasks nor their switches are known both in training and testing. This paper proposes a variational architecture growing framework dubbed VariGrow. By interpreting dynamically growing neural networks as a Bayesian approximation, and defining flexible implicit variational distributions, VariGrow detects if a new task is arriving through an energy-based novelty score. If the novelty score is high and the sample is “detected" as a new task, VariGrow will grow a new expert module to be responsible for it. Otherwise, the sample will be assigned to one of the existing experts who is most “familiar" with it (i.e., one with the lowest novelty score). We have tested VariGrow on several CIFAR and ImageNet-based benchmarks for the strict task-agnostic CL setting and demonstrate its consistent superior performance. Perhaps surprisingly, its performance can even be competitive compared to task-aware methods.' volume: 162 URL: https://proceedings.mlr.press/v162/ardywibowo22a.html PDF: https://proceedings.mlr.press/v162/ardywibowo22a/ardywibowo22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ardywibowo22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Randy family: Ardywibowo - given: Zepeng family: Huo - given: Zhangyang family: Wang - given: Bobak J family: Mortazavi - given: Shuai family: Huang - given: Xiaoning family: Qian editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 865-877 id: ardywibowo22a issued: date-parts: - 2022 - 6 - 28 firstpage: 865 lastpage: 877 published: 2022-06-28 00:00:00 +0000 - title: 'Thresholded Lasso Bandit' abstract: 'In this paper, we revisit the regret minimization problem in sparse stochastic contextual linear bandits, where feature vectors may be of large dimension $d$, but where the reward function depends on a few, say $s_0\ll d$, of these features only. We present Thresholded Lasso bandit, an algorithm that (i) estimates the vector defining the reward function as well as its sparse support, i.e., significant feature elements, using the Lasso framework with thresholding, and (ii) selects an arm greedily according to this estimate projected on its support. The algorithm does not require prior knowledge of the sparsity index $s_0$ and can be parameter-free under some symmetric assumptions. For this simple algorithm, we establish non-asymptotic regret upper bounds scaling as $\mathcal{O}( \log d + \sqrt{T} )$ in general, and as $\mathcal{O}( \log d + \log T)$ under the so-called margin condition (a probabilistic condition on the separation of the arm rewards). The regret of previous algorithms scales as $\mathcal{O}( \log d + \sqrt{T \log (d T)})$ and $\mathcal{O}( \log T \log d)$ in the two settings, respectively. Through numerical experiments, we confirm that our algorithm outperforms existing methods.' volume: 162 URL: https://proceedings.mlr.press/v162/ariu22a.html PDF: https://proceedings.mlr.press/v162/ariu22a/ariu22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ariu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Kaito family: Ariu - given: Kenshi family: Abe - given: Alexandre family: Proutiere editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 878-928 id: ariu22a issued: date-parts: - 2022 - 6 - 28 firstpage: 878 lastpage: 928 published: 2022-06-28 00:00:00 +0000 - title: 'Gradient Based Clustering' abstract: 'We propose a general approach for distance based clustering, using the gradient of the cost function that measures clustering quality with respect to cluster assignments and cluster center positions. The approach is an iterative two step procedure (alternating between cluster assignment and cluster center updates) and is applicable to a wide range of functions, satisfying some mild assumptions. The main advantage of the proposed approach is a simple and computationally cheap update rule. Unlike previous methods that specialize to a specific formulation of the clustering problem, our approach is applicable to a wide range of costs, including non-Bregman clustering methods based on the Huber loss. We analyze the convergence of the proposed algorithm, and show that it converges to the set of appropriately defined fixed points, under arbitrary center initialization. In the special case of Bregman cost functions, the algorithm converges to the set of centroidal Voronoi partitions, which is consistent with prior works. Numerical experiments on real data demonstrate the effectiveness of the proposed method.' volume: 162 URL: https://proceedings.mlr.press/v162/armacki22a.html PDF: https://proceedings.mlr.press/v162/armacki22a/armacki22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-armacki22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Aleksandar family: Armacki - given: Dragana family: Bajovic - given: Dusan family: Jakovetic - given: Soummya family: Kar editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 929-947 id: armacki22a issued: date-parts: - 2022 - 6 - 28 firstpage: 929 lastpage: 947 published: 2022-06-28 00:00:00 +0000 - title: 'Understanding Gradient Descent on the Edge of Stability in Deep Learning' abstract: 'Deep learning experiments by \citet{cohen2021gradient} using deterministic Gradient Descent (GD) revealed an Edge of Stability (EoS) phase when learning rate (LR) and sharpness (i.e., the largest eigenvalue of Hessian) no longer behave as in traditional optimization. Sharpness stabilizes around $2/$LR and loss goes up and down across iterations, yet still with an overall downward trend. The current paper mathematically analyzes a new mechanism of implicit regularization in the EoS phase, whereby GD updates due to non-smooth loss landscape turn out to evolve along some deterministic flow on the manifold of minimum loss. This is in contrast to many previous results about implicit bias either relying on infinitesimal updates or noise in gradient. Formally, for any smooth function $L$ with certain regularity condition, this effect is demonstrated for (1) Normalized GD, i.e., GD with a varying LR $\eta_t =\frac{\eta}{\norm{\nabla L(x(t))}}$ and loss $L$; (2) GD with constant LR and loss $\sqrt{L- \min_x L(x)}$. Both provably enter the Edge of Stability, with the associated flow on the manifold minimizing $\lambda_{1}(\nabla^2 L)$. The above theoretical results have been corroborated by an experimental study.' volume: 162 URL: https://proceedings.mlr.press/v162/arora22a.html PDF: https://proceedings.mlr.press/v162/arora22a/arora22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-arora22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sanjeev family: Arora - given: Zhiyuan family: Li - given: Abhishek family: Panigrahi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 948-1024 id: arora22a issued: date-parts: - 2022 - 6 - 28 firstpage: 948 lastpage: 1024 published: 2022-06-28 00:00:00 +0000 - title: 'Private optimization in the interpolation regime: faster rates and hardness results' abstract: 'In non-private stochastic convex optimization, stochastic gradient methods converge much faster on interpolation problems—namely, problems where there exists a solution that simultaneously minimizes all of the sample losses—than on non-interpolating ones; similar improvements are not known in the private setting. In this paper, we investigate differentially private stochastic optimization in the interpolation regime. First, we show that without additional assumptions, interpolation problems do not exhibit an improved convergence rates with differential privacy. However, when the functions exhibit quadratic growth around the optimum, we show (near) exponential improvements in the private sample complexity. In particular, we propose an adaptive algorithm that improves the sample complexity to achieve expected error $\alpha$ from $\frac{d}{\diffp \sqrt{\alpha}}$ to $\frac{1}{\alpha^\rho} + \frac{d}{\diffp} \log\paren{\frac{1}{\alpha}}$ for any fixed $\rho >0$, while retaining the standard minimax-optimal sample complexity for non-interpolation problems. We prove a lower bound that shows the dimension-dependent term in the expression above is tight. Furthermore, we provide a superefficiency result which demonstrates the necessity of the polynomial term for adaptive algorithms: any algorithm that has a polylogarithmic sample complexity for interpolation problems cannot achieve the minimax-optimal rates for the family of non-interpolation problems.' volume: 162 URL: https://proceedings.mlr.press/v162/asi22a.html PDF: https://proceedings.mlr.press/v162/asi22a/asi22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-asi22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hilal family: Asi - given: Karan family: Chadha - given: Gary family: Cheng - given: John family: Duchi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1025-1045 id: asi22a issued: date-parts: - 2022 - 6 - 28 firstpage: 1025 lastpage: 1045 published: 2022-06-28 00:00:00 +0000 - title: 'Optimal Algorithms for Mean Estimation under Local Differential Privacy' abstract: 'We study the problem of mean estimation of $\ell_2$-bounded vectors under the constraint of local differential privacy. While the literature has a variety of algorithms that achieve the (asymptotic) optimal rates for this problem, the performance of these algorithms in practice can vary significantly due to varying (and often large) hidden constants. In this work, we investigate the question of designing the randomizer with the smallest variance. We show that PrivUnit (Bhowmick et al. 2018) with optimized parameters achieves the optimal variance among a large family of natural randomizers. To prove this result, we establish some properties of local randomizers, and use symmetrization arguments that allow us to write the optimal randomizer as the optimizer of a certain linear program. These structural results, which should extend to other problems, then allow us to show that the optimal randomizer belongs to the PrivUnit family. We also develop a new variant of PrivUnit based on the Gaussian distribution which is more amenable to mathematical analysis and enjoys the same optimality guarantees. This allows us to establish several useful properties on the exact constants of the optimal error as well as to numerically estimate these constants.' volume: 162 URL: https://proceedings.mlr.press/v162/asi22b.html PDF: https://proceedings.mlr.press/v162/asi22b/asi22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-asi22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hilal family: Asi - given: Vitaly family: Feldman - given: Kunal family: Talwar editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1046-1056 id: asi22b issued: date-parts: - 2022 - 6 - 28 firstpage: 1046 lastpage: 1056 published: 2022-06-28 00:00:00 +0000 - title: 'Asymptotically-Optimal Gaussian Bandits with Side Observations' abstract: 'We study the problem of Gaussian bandits with general side information, as first introduced by Wu, Szepesvári, and György. In this setting, the play of an arm reveals information about other arms, according to an arbitrary a priori known side information matrix: each element of this matrix encodes the fidelity of the information that the “row" arm reveals about the “column" arm. In the case of Gaussian noise, this model subsumes standard bandits, full-feedback, and graph-structured feedback as special cases. In this work, we first construct an LP-based asymptotic instance-dependent lower bound on the regret. The LP optimizes the cost (regret) required to reliably estimate the suboptimality gap of each arm. This LP lower bound motivates our main contribution: the first known asymptotically optimal algorithm for this general setting.' volume: 162 URL: https://proceedings.mlr.press/v162/atsidakou22a.html PDF: https://proceedings.mlr.press/v162/atsidakou22a/atsidakou22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-atsidakou22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Alexia family: Atsidakou - given: Orestis family: Papadigenopoulos - given: Constantine family: Caramanis - given: Sujay family: Sanghavi - given: Sanjay family: Shakkottai editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1057-1077 id: atsidakou22a issued: date-parts: - 2022 - 6 - 28 firstpage: 1057 lastpage: 1077 published: 2022-06-28 00:00:00 +0000 - title: 'Congested Bandits: Optimal Routing via Short-term Resets' abstract: 'For traffic routing platforms, the choice of which route to recommend to a user depends on the congestion on these routes – indeed, an individual’s utility depends on the number of people using the recommended route at that instance. Motivated by this, we introduce the problem of Congested Bandits where each arm’s reward is allowed to depend on the number of times it was played in the past $\Delta$ timesteps. This dependence on past history of actions leads to a dynamical system where an algorithm’s present choices also affect its future pay-offs, and requires an algorithm to plan for this. We study the congestion aware formulation in the multi-armed bandit (MAB) setup and in the contextual bandit setup with linear rewards. For the multi-armed setup, we propose a UCB style algorithm and show that its policy regret scales as $\tilde{O}(\sqrt{K \Delta T})$. For the linear contextual bandit setup, our algorithm, based on an iterative least squares planner, achieves policy regret $\tilde{O}(\sqrt{dT} + \Delta)$. From an experimental standpoint, we corroborate the no-regret properties of our algorithms via a simulation study.' volume: 162 URL: https://proceedings.mlr.press/v162/awasthi22a.html PDF: https://proceedings.mlr.press/v162/awasthi22a/awasthi22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-awasthi22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Pranjal family: Awasthi - given: Kush family: Bhatia - given: Sreenivas family: Gollapudi - given: Kostas family: Kollias editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1078-1100 id: awasthi22a issued: date-parts: - 2022 - 6 - 28 firstpage: 1078 lastpage: 1100 published: 2022-06-28 00:00:00 +0000 - title: 'Do More Negative Samples Necessarily Hurt In Contrastive Learning?' abstract: 'Recent investigations in noise contrastive estimation suggest, both empirically as well as theoretically, that while having more “negative samples” in the contrastive loss improves downstream classification performance initially, beyond a threshold, it hurts downstream performance due to a “collision-coverage” trade-off. But is such a phenomenon inherent in contrastive learning? We show in a simple theoretical setting, where positive pairs are generated by sampling from the underlying latent class (introduced by Saunshi et al. (ICML 2019)), that the downstream performance of the representation optimizing the (population) contrastive loss in fact does not degrade with the number of negative samples. Along the way, we give a structural characterization of the optimal representation in our framework, for noise contrastive estimation. We also provide empirical support for our theoretical results on CIFAR-10 and CIFAR-100 datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/awasthi22b.html PDF: https://proceedings.mlr.press/v162/awasthi22b/awasthi22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-awasthi22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Pranjal family: Awasthi - given: Nishanth family: Dikkala - given: Pritish family: Kamath editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1101-1116 id: awasthi22b issued: date-parts: - 2022 - 6 - 28 firstpage: 1101 lastpage: 1116 published: 2022-06-28 00:00:00 +0000 - title: 'H-Consistency Bounds for Surrogate Loss Minimizers' abstract: 'We present a detailed study of estimation errors in terms of surrogate loss estimation errors. We refer to such guarantees as H-consistency bounds, since they account for the hypothesis set H adopted. These guarantees are significantly stronger than H-calibration or H-consistency. They are also more informative than similar excess error bounds derived in the literature, when H is the family of all measurable functions. We prove general theorems providing such guarantees, for both the distribution-dependent and distribution-independent settings. We show that our bounds are tight, modulo a convexity assumption. We also show that previous excess error bounds can be recovered as special cases of our general results. We then present a series of explicit bounds in the case of the zero-one loss, with multiple choices of the surrogate loss and for both the family of linear functions and neural networks with one hidden-layer. We further prove more favorable distribution-dependent guarantees in that case. We also present a series of explicit bounds in the case of the adversarial loss, with surrogate losses based on the supremum of the $\rho$-margin, hinge or sigmoid loss and for the same two general hypothesis sets. Here too, we prove several enhancements of these guarantees under natural distributional assumptions. Finally, we report the results of simulations illustrating our bounds and their tightness.' volume: 162 URL: https://proceedings.mlr.press/v162/awasthi22c.html PDF: https://proceedings.mlr.press/v162/awasthi22c/awasthi22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-awasthi22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Pranjal family: Awasthi - given: Anqi family: Mao - given: Mehryar family: Mohri - given: Yutao family: Zhong editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1117-1174 id: awasthi22c issued: date-parts: - 2022 - 6 - 28 firstpage: 1117 lastpage: 1174 published: 2022-06-28 00:00:00 +0000 - title: 'Iterative Hard Thresholding with Adaptive Regularization: Sparser Solutions Without Sacrificing Runtime' abstract: 'We propose a simple modification to the iterative hard thresholding (IHT) algorithm, which recovers asymptotically sparser solutions as a function of the condition number. When aiming to minimize a convex function f(x) with condition number $\kappa$ subject to x being an s-sparse vector, the standard IHT guarantee is a solution with relaxed sparsity $O(s\kappa^2)$, while our proposed algorithm, regularized IHT, returns a solution with sparsity $O(s\kappa)$. Our algorithm significantly improves over ARHT [Axiotis & Sviridenko, 2021] which also achieves $O(s\kappa)$, as it does not require re-optimization in each iteration (and so is much faster), is deterministic, and does not require knowledge of the optimal solution value f(x*) or the optimal sparsity level s. Our main technical tool is an adaptive regularization framework, in which the algorithm progressively learns the weights of an l_2 regularization term that will allow convergence to sparser solutions. We also apply this framework to low rank optimization, where we achieve a similar improvement of the best known condition number dependence from $\kappa^2$ to $\kappa$.' volume: 162 URL: https://proceedings.mlr.press/v162/axiotis22a.html PDF: https://proceedings.mlr.press/v162/axiotis22a/axiotis22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-axiotis22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Kyriakos family: Axiotis - given: Maxim family: Sviridenko editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1175-1197 id: axiotis22a issued: date-parts: - 2022 - 6 - 28 firstpage: 1175 lastpage: 1197 published: 2022-06-28 00:00:00 +0000 - title: 'Proving Theorems using Incremental Learning and Hindsight Experience Replay' abstract: 'Traditional automated theorem proving systems for first-order logic depend on speed-optimized search and many handcrafted heuristics designed to work over a wide range of domains. Machine learning approaches in the literature either depend on these traditional provers to bootstrap themselves, by leveraging these heuristics, or can struggle due to limited existing proof data. The latter issue can be explained by the lack of a smooth difficulty gradient in theorem proving datasets; large gaps in difficulty between different theorems can make training harder or even impossible. In this paper, we adapt the idea of hindsight experience replay from reinforcement learning to the automated theorem proving domain, so as to use the intermediate data generated during unsuccessful proof attempts. We build a first-order logic prover by disabling all the smart clause-scoring heuristics of the state-of-the-art E prover and replacing them with a clause-scoring neural network learned by using hindsight experience replay in an incremental learning setting. Clauses are represented as graphs and presented to transformer networks with spectral features. We show that provers trained in this way can outperform previous machine learning approaches and compete with the state of the art heuristic-based theorem prover E in its best configuration, on the popular benchmarks MPTP2078, M2k and Mizar40. The proofs generated by our algorithm are also almost always significantly shorter than E’s proofs.' volume: 162 URL: https://proceedings.mlr.press/v162/aygun22a.html PDF: https://proceedings.mlr.press/v162/aygun22a/aygun22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-aygun22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Eser family: Aygün - given: Ankit family: Anand - given: Laurent family: Orseau - given: Xavier family: Glorot - given: Stephen M family: Mcaleer - given: Vlad family: Firoiu - given: Lei M family: Zhang - given: Doina family: Precup - given: Shibl family: Mourad editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1198-1210 id: aygun22a issued: date-parts: - 2022 - 6 - 28 firstpage: 1198 lastpage: 1210 published: 2022-06-28 00:00:00 +0000 - title: 'Near-optimal rate of consistency for linear models with missing values' abstract: 'Missing values arise in most real-world data sets due to the aggregation of multiple sources and intrinsically missing information (sensor failure, unanswered questions in surveys...). In fact, the very nature of missing values usually prevents us from running standard learning algorithms. In this paper, we focus on the extensively-studied linear models, but in presence of missing values, which turns out to be quite a challenging task. Indeed, the Bayes predictor can be decomposed as a sum of predictors corresponding to each missing pattern. This eventually requires to solve a number of learning tasks, exponential in the number of input features, which makes predictions impossible for current real-world datasets. First, we propose a rigorous setting to analyze a least-square type estimator and establish a bound on the excess risk which increases exponentially in the dimension. Consequently, we leverage the missing data distribution to propose a new algorithm, and derive associated adaptive risk bounds that turn out to be minimax optimal. Numerical experiments highlight the benefits of our method compared to state-of-the-art algorithms used for predictions with missing values.' volume: 162 URL: https://proceedings.mlr.press/v162/ayme22a.html PDF: https://proceedings.mlr.press/v162/ayme22a/ayme22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ayme22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Alexis family: Ayme - given: Claire family: Boyer - given: Aymeric family: Dieuleveut - given: Erwan family: Scornet editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1211-1243 id: ayme22a issued: date-parts: - 2022 - 6 - 28 firstpage: 1211 lastpage: 1243 published: 2022-06-28 00:00:00 +0000 - title: 'How Tempering Fixes Data Augmentation in Bayesian Neural Networks' abstract: 'While Bayesian neural networks (BNNs) provide a sound and principled alternative to standard neural networks, an artificial sharpening of the posterior usually needs to be applied to reach comparable performance. This is in stark contrast to theory, dictating that given an adequate prior and a well-specified model, the untempered Bayesian posterior should achieve optimal performance. Despite the community’s extensive efforts, the observed gains in performance still remain disputed with several plausible causes pointing at its origin. While data augmentation has been empirically recognized as one of the main drivers of this effect, a theoretical account of its role, on the other hand, is largely missing. In this work we identify two interlaced factors concurrently influencing the strength of the cold posterior effect, namely the correlated nature of augmentations and the degree of invariance of the employed model to such transformations. By theoretically analyzing simplified settings, we prove that tempering implicitly reduces the misspecification arising from modeling augmentations as i.i.d. data. The temperature mimics the role of the effective sample size, reflecting the gain in information provided by the augmentations. We corroborate our theoretical findings with extensive empirical evaluations, scaling to realistic BNNs. By relying on the framework of group convolutions, we experiment with models of varying inherent degree of invariance, confirming its hypothesized relationship with the optimal temperature.' volume: 162 URL: https://proceedings.mlr.press/v162/bachmann22a.html PDF: https://proceedings.mlr.press/v162/bachmann22a/bachmann22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-bachmann22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Gregor family: Bachmann - given: Lorenzo family: Noci - given: Thomas family: Hofmann editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1244-1260 id: bachmann22a issued: date-parts: - 2022 - 6 - 28 firstpage: 1244 lastpage: 1260 published: 2022-06-28 00:00:00 +0000 - title: 'ASAP.SGD: Instance-based Adaptiveness to Staleness in Asynchronous SGD' abstract: 'Concurrent algorithmic implementations of Stochastic Gradient Descent (SGD) give rise to critical questions for compute-intensive Machine Learning (ML). Asynchrony implies speedup in some contexts, and challenges in others, as stale updates may lead to slower, or non-converging executions. While previous works showed asynchrony-adaptiveness can improve stability and speedup by reducing the step size for stale updates according to static rules, there is no one-size-fits-all adaptation rule, since the optimal strategy depends on several factors. We introduce (i) $\mathtt{ASAP.SGD}$, an analytical framework capturing necessary and desired properties of staleness-adaptive step size functions and (ii) \textsc{tail}-$\tau$, a method for utilizing key properties of the execution instance, generating a tailored strategy that not only dampens the impact of stale updates, but also leverages fresh ones. We recover convergence bounds for adaptiveness functions satisfying the $\mathtt{ASAP.SGD}$ conditions for general, convex and non-convex problems, and establish novel bounds for ones satisfying the Polyak-Lojasiewicz property. We evaluate \textsc{tail}-$\tau$ with representative AsyncSGD concurrent algorithms, for Deep Learning problems, showing \textsc{tail}-$\tau$ is a vital complement to AsyncSGD, with (i) persistent speedup in wall-clock convergence time in the parallelism spectrum, (ii) considerably lower risk of non-convergence, as well as (iii) precision levels for which original SGD implementations fail.' volume: 162 URL: https://proceedings.mlr.press/v162/backstrom22a.html PDF: https://proceedings.mlr.press/v162/backstrom22a/backstrom22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-backstrom22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Karl family: Bäckström - given: Marina family: Papatriantafilou - given: Philippas family: Tsigas editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1261-1276 id: backstrom22a issued: date-parts: - 2022 - 6 - 28 firstpage: 1261 lastpage: 1276 published: 2022-06-28 00:00:00 +0000 - title: 'From Noisy Prediction to True Label: Noisy Prediction Calibration via Generative Model' abstract: 'Noisy labels are inevitable yet problematic in machine learning society. It ruins the generalization of a classifier by making the classifier over-fitted to noisy labels. Existing methods on noisy label have focused on modifying the classifier during the training procedure. It has two potential problems. First, these methods are not applicable to a pre-trained classifier without further access to training. Second, it is not easy to train a classifier and regularize all negative effects from noisy labels, simultaneously. We suggest a new branch of method, Noisy Prediction Calibration (NPC) in learning with noisy labels. Through the introduction and estimation of a new type of transition matrix via generative model, NPC corrects the noisy prediction from the pre-trained classifier to the true label as a post-processing scheme. We prove that NPC theoretically aligns with the transition matrix based methods. Yet, NPC empirically provides more accurate pathway to estimate true label, even without involvement in classifier learning. Also, NPC is applicable to any classifier trained with noisy label methods, if training instances and its predictions are available. Our method, NPC, boosts the classification performances of all baseline models on both synthetic and real-world datasets. The implemented code is available at https://github.com/BaeHeeSun/NPC.' volume: 162 URL: https://proceedings.mlr.press/v162/bae22a.html PDF: https://proceedings.mlr.press/v162/bae22a/bae22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-bae22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Heesun family: Bae - given: Seungjae family: Shin - given: Byeonghu family: Na - given: Joonho family: Jang - given: Kyungwoo family: Song - given: Il-Chul family: Moon editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1277-1297 id: bae22a issued: date-parts: - 2022 - 6 - 28 firstpage: 1277 lastpage: 1297 published: 2022-06-28 00:00:00 +0000 - title: 'data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language' abstract: 'While the general idea of self-supervised learning is identical across modalities, the actual algorithms and objectives differ widely because they were developed with a single modality in mind. To get us closer to general self-supervised learning, we present data2vec, a framework that uses the same learning method for either speech, NLP or computer vision. The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture. Instead of predicting modality-specific targets such as words, visual tokens or units of human speech which are local in nature, data2vec predicts contextualized latent representations that contain information from the entire input. Experiments on the major benchmarks of speech recognition, image classification, and natural language understanding demonstrate a new state of the art or competitive performance to predominant approaches.' volume: 162 URL: https://proceedings.mlr.press/v162/baevski22a.html PDF: https://proceedings.mlr.press/v162/baevski22a/baevski22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-baevski22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Alexei family: Baevski - given: Wei-Ning family: Hsu - given: Qiantong family: Xu - given: Arun family: Babu - given: Jiatao family: Gu - given: Michael family: Auli editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1298-1312 id: baevski22a issued: date-parts: - 2022 - 6 - 28 firstpage: 1298 lastpage: 1312 published: 2022-06-28 00:00:00 +0000 - title: 'End-to-End Balancing for Causal Continuous Treatment-Effect Estimation' abstract: 'We study the problem of observational causal inference with continuous treatment. We focus on the challenge of estimating the causal response curve for infrequently-observed treatment values. We design a new algorithm based on the framework of entropy balancing which learns weights that directly maximize causal inference accuracy using end-to-end optimization. Our weights can be customized for different datasets and causal inference algorithms. We propose a new theory for consistency of entropy balancing for continuous treatments. Using synthetic and real-world data, we show that our proposed algorithm outperforms the entropy balancing in terms of causal inference accuracy.' volume: 162 URL: https://proceedings.mlr.press/v162/bahadori22a.html PDF: https://proceedings.mlr.press/v162/bahadori22a/bahadori22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-bahadori22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Taha family: Bahadori - given: Eric Tchetgen family: Tchetgen - given: David family: Heckerman editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1313-1326 id: bahadori22a issued: date-parts: - 2022 - 6 - 28 firstpage: 1313 lastpage: 1326 published: 2022-06-28 00:00:00 +0000 - title: 'A Hierarchical Transitive-Aligned Graph Kernel for Un-attributed Graphs' abstract: 'In this paper, we develop a new graph kernel, namely the Hierarchical Transitive-Aligned Kernel, by transitively aligning the vertices between graphs through a family of hierarchical prototype graphs. Comparing to most existing state-of-the-art graph kernels, the proposed kernel has three theoretical advantages. First, it incorporates the locational correspondence information between graphs into the kernel computation, and thus overcomes the shortcoming of ignoring structural correspondences arising in most R-convolution kernels. Second, it guarantees the transitivity between the correspondence information that is not available for most existing matching kernels. Third, it incorporates the information of all graphs under comparisons into the kernel computation process, and thus encapsulates richer characteristics. Experimental evaluations demonstrate the effectiveness of the new transitive-aligned kernel.' volume: 162 URL: https://proceedings.mlr.press/v162/bai22a.html PDF: https://proceedings.mlr.press/v162/bai22a/bai22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-bai22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Lu family: Bai - given: Lixin family: Cui - given: Hancock family: Edwin editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1327-1336 id: bai22a issued: date-parts: - 2022 - 6 - 28 firstpage: 1327 lastpage: 1336 published: 2022-06-28 00:00:00 +0000 - title: 'Near-Optimal Learning of Extensive-Form Games with Imperfect Information' abstract: 'This paper resolves the open question of designing near-optimal algorithms for learning imperfect-information extensive-form games from bandit feedback. We present the first line of algorithms that require only $\widetilde{\mathcal{O}}((XA+YB)/\varepsilon^2)$ episodes of play to find an $\varepsilon$-approximate Nash equilibrium in two-player zero-sum games, where $X,Y$ are the number of information sets and $A,B$ are the number of actions for the two players. This improves upon the best known sample complexity of $\widetilde{\mathcal{O}}((X^2A+Y^2B)/\varepsilon^2)$ by a factor of $\widetilde{\mathcal{O}}(\max\{X, Y\})$, and matches the information-theoretic lower bound up to logarithmic factors. We achieve this sample complexity by two new algorithms: Balanced Online Mirror Descent, and Balanced Counterfactual Regret Minimization. Both algorithms rely on novel approaches of integrating balanced exploration policies into their classical counterparts. We also extend our results to learning Coarse Correlated Equilibria in multi-player general-sum games.' volume: 162 URL: https://proceedings.mlr.press/v162/bai22b.html PDF: https://proceedings.mlr.press/v162/bai22b/bai22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-bai22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yu family: Bai - given: Chi family: Jin - given: Song family: Mei - given: Tiancheng family: Yu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1337-1382 id: bai22b issued: date-parts: - 2022 - 6 - 28 firstpage: 1337 lastpage: 1382 published: 2022-06-28 00:00:00 +0000 - title: 'Gaussian Mixture Variational Autoencoder with Contrastive Learning for Multi-Label Classification' abstract: 'Multi-label classification (MLC) is a prediction task where each sample can have more than one label. We propose a novel contrastive learning boosted multi-label prediction model based on a Gaussian mixture variational autoencoder (C-GMVAE), which learns a multimodal prior space and employs a contrastive loss. Many existing methods introduce extra complex neural modules like graph neural networks to capture the label correlations, in addition to the prediction modules. We find that by using contrastive learning in the supervised setting, we can exploit label information effectively in a data-driven manner, and learn meaningful feature and label embeddings which capture the label correlations and enhance the predictive power. Our method also adopts the idea of learning and aligning latent spaces for both features and labels. In contrast to previous works based on a unimodal prior, C-GMVAE imposes a Gaussian mixture structure on the latent space, to alleviate the posterior collapse and over-regularization issues. C-GMVAE outperforms existing methods on multiple public datasets and can often match other models’ full performance with only 50% of the training data. Furthermore, we show that the learnt embeddings provide insights into the interpretation of label-label interactions.' volume: 162 URL: https://proceedings.mlr.press/v162/bai22c.html PDF: https://proceedings.mlr.press/v162/bai22c/bai22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-bai22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Junwen family: Bai - given: Shufeng family: Kong - given: Carla P family: Gomes editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1383-1398 id: bai22c issued: date-parts: - 2022 - 6 - 28 firstpage: 1383 lastpage: 1398 published: 2022-06-28 00:00:00 +0000 - title: 'A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing' abstract: 'Recently, speech representation learning has improved many speech-related tasks such as speech recognition, speech classification, and speech-to-text translation. However, all the above tasks are in the direction of speech understanding, but for the inverse direction, speech synthesis, the potential of representation learning is yet to be realized, due to the challenging nature of generating high-quality speech. To address this problem, we propose our framework, Alignment-Aware Acoustic-Text Pretraining (A$^3$T), which reconstructs masked acoustic signals with text input and acoustic-text alignment during training. In this way, the pretrained model can generate high quality reconstructed spectrogram, which can be applied to the speech editing and unseen speaker TTS directly. Experiments show A$^3$T outperforms SOTA models on speech editing, and improves multi-speaker speech synthesis without the external speaker verification model.' volume: 162 URL: https://proceedings.mlr.press/v162/bai22d.html PDF: https://proceedings.mlr.press/v162/bai22d/bai22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-bai22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: He family: Bai - given: Renjie family: Zheng - given: Junkun family: Chen - given: Mingbo family: Ma - given: Xintong family: Li - given: Liang family: Huang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1399-1411 id: bai22d issued: date-parts: - 2022 - 6 - 28 firstpage: 1399 lastpage: 1411 published: 2022-06-28 00:00:00 +0000 - title: 'Stability Based Generalization Bounds for Exponential Family Langevin Dynamics' abstract: 'Recent years have seen advances in generalization bounds for noisy stochastic algorithms, especially stochastic gradient Langevin dynamics (SGLD) based on stability (Mou et al., 2018; Li et al., 2020) and information theoretic approaches (Xu & Raginsky, 2017; Negrea et al., 2019; Steinke & Zakynthinou, 2020). In this paper, we unify and substantially generalize stability based generalization bounds and make three technical contributions. First, we bound the generalization error in terms of expected (not uniform) stability which arguably leads to quantitatively sharper bounds. Second, as our main contribution, we introduce Exponential Family Langevin Dynamics (EFLD), a substantial generalization of SGLD, which includes noisy versions of Sign-SGD and quantized SGD as special cases. We establish data dependent expected stability based generalization bounds for any EFLD algorithm with a O(1/n) sample dependence and dependence on gradient discrepancy rather than the norm of gradients, yielding significantly sharper bounds. Third, we establish optimization guarantees for special cases of EFLD. Further, empirical results on benchmarks illustrate that our bounds are non-vacuous, quantitatively sharper than existing bounds, and behave correctly under noisy labels.' volume: 162 URL: https://proceedings.mlr.press/v162/banerjee22a.html PDF: https://proceedings.mlr.press/v162/banerjee22a/banerjee22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-banerjee22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Arindam family: Banerjee - given: Tiancong family: Chen - given: Xinyan family: Li - given: Yingxue family: Zhou editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1412-1449 id: banerjee22a issued: date-parts: - 2022 - 6 - 28 firstpage: 1412 lastpage: 1449 published: 2022-06-28 00:00:00 +0000 - title: 'Certified Neural Network Watermarks with Randomized Smoothing' abstract: 'Watermarking is a commonly used strategy to protect creators’ rights to digital images, videos and audio. Recently, watermarking methods have been extended to deep learning models – in principle, the watermark should be preserved when an adversary tries to copy the model. However, in practice, watermarks can often be removed by an intelligent adversary. Several papers have proposed watermarking methods that claim to be empirically resistant to different types of removal attacks, but these new techniques often fail in the face of new or better-tuned adversaries. In this paper, we propose the first certifiable watermarking method. Using the randomized smoothing technique, we show that our watermark is guaranteed to be unremovable unless the model parameters are changed by more than a certain $\ell_2$ threshold. In addition to being certifiable, our watermark is also empirically more robust compared to previous watermarking methods.' volume: 162 URL: https://proceedings.mlr.press/v162/bansal22a.html PDF: https://proceedings.mlr.press/v162/bansal22a/bansal22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-bansal22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Arpit family: Bansal - given: Ping-Yeh family: Chiang - given: Michael J family: Curry - given: Rajiv family: Jain - given: Curtis family: Wigington - given: Varun family: Manjunatha - given: John P family: Dickerson - given: Tom family: Goldstein editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1450-1465 id: bansal22a issued: date-parts: - 2022 - 6 - 28 firstpage: 1450 lastpage: 1465 published: 2022-06-28 00:00:00 +0000 - title: 'Data Scaling Laws in NMT: The Effect of Noise and Architecture' abstract: 'In this work, we study the effect of varying the architecture and training data quality on the data scaling properties of Neural Machine Translation (NMT). First, we establish that the test loss of encoder-decoder transformer models scales as a power law in the number of training samples, with a dependence on the model size. Then, we systematically vary aspects of the training setup to understand how they impact the data scaling laws. In particular, we change the following (1) Architecture and task setup: We compare to a transformer-LSTM hybrid, and a decoder-only transformer with a language modeling loss (2) Noise level in the training distribution: We experiment with filtering, and adding iid synthetic noise. In all the above cases, we find that the data scaling exponents are minimally impacted, suggesting that marginally worse architectures or training data can be compensated for by adding more data. Lastly, we find that using back-translated data instead of parallel data, can significantly degrade the scaling exponent.' volume: 162 URL: https://proceedings.mlr.press/v162/bansal22b.html PDF: https://proceedings.mlr.press/v162/bansal22b/bansal22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-bansal22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yamini family: Bansal - given: Behrooz family: Ghorbani - given: Ankush family: Garg - given: Biao family: Zhang - given: Colin family: Cherry - given: Behnam family: Neyshabur - given: Orhan family: Firat editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1466-1482 id: bansal22b issued: date-parts: - 2022 - 6 - 28 firstpage: 1466 lastpage: 1482 published: 2022-06-28 00:00:00 +0000 - title: 'Learning Stable Classifiers by Transferring Unstable Features' abstract: 'While unbiased machine learning models are essential for many applications, bias is a human-defined concept that can vary across tasks. Given only input-label pairs, algorithms may lack sufficient information to distinguish stable (causal) features from unstable (spurious) features. However, related tasks often share similar biases – an observation we may leverage to develop stable classifiers in the transfer setting. In this work, we explicitly inform the target classifier about unstable features in the source tasks. Specifically, we derive a representation that encodes the unstable features by contrasting different data environments in the source task. We achieve robustness by clustering data of the target task according to this representation and minimizing the worst-case risk across these clusters. We evaluate our method on both text and image classifications. Empirical results demonstrate that our algorithm is able to maintain robustness on the target task for both synthetically generated environments and real-world environments. Our code is available at https://github.com/YujiaBao/Tofu.' volume: 162 URL: https://proceedings.mlr.press/v162/bao22a.html PDF: https://proceedings.mlr.press/v162/bao22a/bao22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-bao22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yujia family: Bao - given: Shiyu family: Chang - given: Dr.Regina family: Barzilay editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1483-1507 id: bao22a issued: date-parts: - 2022 - 6 - 28 firstpage: 1483 lastpage: 1507 published: 2022-06-28 00:00:00 +0000 - title: 'Fast Composite Optimization and Statistical Recovery in Federated Learning' abstract: 'As a prevalent distributed learning paradigm, Federated Learning (FL) trains a global model on a massive amount of devices with infrequent communication. This paper investigates a class of composite optimization and statistical recovery problems in the FL setting, whose loss function consists of a data-dependent smooth loss and a non-smooth regularizer. Examples include sparse linear regression using Lasso, low-rank matrix recovery using nuclear norm regularization, etc. In the existing literature, federated composite optimization algorithms are designed only from an optimization perspective without any statistical guarantees. In addition, they do not consider commonly used (restricted) strong convexity in statistical recovery problems. We advance the frontiers of this problem from both optimization and statistical perspectives. From optimization upfront, we propose a new algorithm named Fast Federated Dual Averaging for strongly convex and smooth loss and establish state-of-the-art iteration and communication complexity in the composite setting. In particular, we prove that it enjoys a fast rate, linear speedup, and reduced communication rounds. From statistical upfront, for restricted strongly convex and smooth loss, we design another algorithm, namely Multi-stage Federated Dual Averaging, and prove a high probability complexity bound with linear speedup up to optimal statistical precision. Numerical experiments in both synthetic and real data demonstrate that our methods perform better than other baselines. To the best of our knowledge, this is the first work providing fast optimization algorithms and statistical recovery guarantees for composite problems in FL.' volume: 162 URL: https://proceedings.mlr.press/v162/bao22b.html PDF: https://proceedings.mlr.press/v162/bao22b/bao22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-bao22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yajie family: Bao - given: Michael family: Crawshaw - given: Shan family: Luo - given: Mingrui family: Liu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1508-1536 id: bao22b issued: date-parts: - 2022 - 6 - 28 firstpage: 1508 lastpage: 1536 published: 2022-06-28 00:00:00 +0000 - title: 'Generative Modeling for Multi-task Visual Learning' abstract: 'Generative modeling has recently shown great promise in computer vision, but it has mostly focused on synthesizing visually realistic images. In this paper, motivated by multi-task learning of shareable feature representations, we consider a novel problem of learning a shared generative model that is useful across various visual perception tasks. Correspondingly, we propose a general multi-task oriented generative modeling (MGM) framework, by coupling a discriminative multi-task network with a generative network. While it is challenging to synthesize both RGB images and pixel-level annotations in multi-task scenarios, our framework enables us to use synthesized images paired with only weak annotations (i.e., image-level scene labels) to facilitate multiple visual tasks. Experimental evaluation on challenging multi-task benchmarks, including NYUv2 and Taskonomy, demonstrates that our MGM framework improves the performance of all the tasks by large margins, consistently outperforming state-of-the-art multi-task approaches in different sample-size regimes.' volume: 162 URL: https://proceedings.mlr.press/v162/bao22c.html PDF: https://proceedings.mlr.press/v162/bao22c/bao22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-bao22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhipeng family: Bao - given: Martial family: Hebert - given: Yu-Xiong family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1537-1554 id: bao22c issued: date-parts: - 2022 - 6 - 28 firstpage: 1537 lastpage: 1554 published: 2022-06-28 00:00:00 +0000 - title: 'Estimating the Optimal Covariance with Imperfect Mean in Diffusion Probabilistic Models' abstract: 'Diffusion probabilistic models (DPMs) are a class of powerful deep generative models (DGMs). Despite their success, the iterative generation process over the full timesteps is much less efficient than other DGMs such as GANs. Thus, the generation performance on a subset of timesteps is crucial, which is greatly influenced by the covariance design in DPMs. In this work, we consider diagonal and full covariances to improve the expressive power of DPMs. We derive the optimal result for such covariances, and then correct it when the mean of DPMs is imperfect. Both the optimal and the corrected ones can be decomposed into terms of conditional expectations over functions of noise. Building upon it, we propose to estimate the optimal covariance and its correction given imperfect mean by learning these conditional expectations. Our method can be applied to DPMs with both discrete and continuous timesteps. We consider the diagonal covariance in our implementation for computational efficiency. For an efficient practical implementation, we adopt a parameter sharing scheme and a two-stage training process. Empirically, our method outperforms a wide variety of covariance design on likelihood results, and improves the sample quality especially on a small number of timesteps.' volume: 162 URL: https://proceedings.mlr.press/v162/bao22d.html PDF: https://proceedings.mlr.press/v162/bao22d/bao22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-bao22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Fan family: Bao - given: Chongxuan family: Li - given: Jiacheng family: Sun - given: Jun family: Zhu - given: Bo family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1555-1584 id: bao22d issued: date-parts: - 2022 - 6 - 28 firstpage: 1555 lastpage: 1584 published: 2022-06-28 00:00:00 +0000 - title: 'On the Surrogate Gap between Contrastive and Supervised Losses' abstract: 'Contrastive representation learning encourages data representation to make semantically similar pairs closer than randomly drawn negative samples, which has been successful in various domains such as vision, language, and graphs. Recent theoretical studies have attempted to explain the benefit of the large negative sample size by upper-bounding the downstream classification loss with the contrastive loss. However, the previous surrogate bounds have two drawbacks: they are only legitimate for a limited range of negative sample sizes and prohibitively large even within that range. Due to these drawbacks, there still does not exist a consensus on how negative sample size theoretically correlates with downstream classification performance. Following the simplified setting where positive pairs are drawn from the true distribution (not generated by data augmentation; as supposed in previous studies), this study establishes surrogate upper and lower bounds for the downstream classification loss for all negative sample sizes that best explain the empirical observations on the negative sample size in the earlier studies. Our bounds suggest that the contrastive loss can be viewed as a surrogate objective of the downstream loss and larger negative sample sizes improve downstream classification because the surrogate gap between contrastive and supervised losses decays. We verify that our theory is consistent with experiments on synthetic, vision, and language datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/bao22e.html PDF: https://proceedings.mlr.press/v162/bao22e/bao22e.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-bao22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Han family: Bao - given: Yoshihiro family: Nagano - given: Kento family: Nozawa editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1585-1606 id: bao22e issued: date-parts: - 2022 - 6 - 28 firstpage: 1585 lastpage: 1606 published: 2022-06-28 00:00:00 +0000 - title: 'Representation Topology Divergence: A Method for Comparing Neural Network Representations.' abstract: 'Comparison of data representations is a complex multi-aspect problem. We propose a method for comparing two data representations. We introduce the Representation Topology Divergence (RTD) score measuring the dissimilarity in multi-scale topology between two point clouds of equal size with a one-to-one correspondence between points. The two data point clouds can lie in different ambient spaces. The RTD score is one of the few topological data analysis based practical methods applicable to real machine learning datasets. Experiments show the agreement of RTD with the intuitive assessment of data representation similarity. The proposed RTD score is sensitive to the data representation’s fine topological structure. We use the RTD score to gain insights on neural networks representations in computer vision and NLP domains for various problems: training dynamics analysis, data distribution shift, transfer learning, ensemble learning, disentanglement assessment.' volume: 162 URL: https://proceedings.mlr.press/v162/barannikov22a.html PDF: https://proceedings.mlr.press/v162/barannikov22a/barannikov22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-barannikov22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Serguei family: Barannikov - given: Ilya family: Trofimov - given: Nikita family: Balabin - given: Evgeny family: Burnaev editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1607-1626 id: barannikov22a issued: date-parts: - 2022 - 6 - 28 firstpage: 1607 lastpage: 1626 published: 2022-06-28 00:00:00 +0000 - title: 'Sparse Mixed Linear Regression with Guarantees: Taming an Intractable Problem with Invex Relaxation' abstract: 'In this paper, we study the problem of sparse mixed linear regression on an unlabeled dataset that is generated from linear measurements from two different regression parameter vectors. Since the data is unlabeled, our task is to not only figure out a good approximation of regression parameter vectors but also label the dataset correctly. In its original form, this problem is NP-hard. The most popular algorithms to solve this problem (such as Expectation-Maximization) have a tendency to stuck at local minima. We provide a novel invex relaxation for this intractable problem which leads to a solution with provable theoretical guarantees. This relaxation enables exact recovery of data labels. Furthermore, we recover close approximation of regression parameter vectors which match the true parameter vectors in support and sign. Our formulation uses a carefully constructed primal dual witnesses framework for the invex problem. Furthermore, we show that the sample complexity of our method is only logarithmic in terms of the dimension of the regression parameter vectors.' volume: 162 URL: https://proceedings.mlr.press/v162/barik22a.html PDF: https://proceedings.mlr.press/v162/barik22a/barik22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-barik22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Adarsh family: Barik - given: Jean family: Honorio editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1627-1646 id: barik22a issued: date-parts: - 2022 - 6 - 28 firstpage: 1627 lastpage: 1646 published: 2022-06-28 00:00:00 +0000 - title: 'Neural Fisher Discriminant Analysis: Optimal Neural Network Embeddings in Polynomial Time' abstract: 'Fisher’s Linear Discriminant Analysis (FLDA) is a statistical analysis method that linearly embeds data points to a lower dimensional space to maximize a discrimination criterion such that the variance between classes is maximized while the variance within classes is minimized. We introduce a natural extension of FLDA that employs neural networks, called Neural Fisher Discriminant Analysis (NFDA). This method finds the optimal two-layer neural network that embeds data points to optimize the same discrimination criterion. We use tools from convex optimization to transform the optimal neural network embedding problem into a convex problem. The resulting problem is easy to interpret and solve to global optimality. We evaluate the method’s performance on synthetic and real datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/bartan22a.html PDF: https://proceedings.mlr.press/v162/bartan22a/bartan22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-bartan22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Burak family: Bartan - given: Mert family: Pilanci editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1647-1663 id: bartan22a issued: date-parts: - 2022 - 6 - 28 firstpage: 1647 lastpage: 1663 published: 2022-06-28 00:00:00 +0000 - title: 'Fictitious Play and Best-Response Dynamics in Identical Interest and Zero-Sum Stochastic Games' abstract: 'This paper proposes an extension of a popular decentralized discrete-time learning procedure when repeating a static game called fictitious play (FP) (Brown, 1951; Robinson, 1951) to a dynamic model called discounted stochastic game (Shapley, 1953). Our family of discrete-time FP procedures is proven to converge to the set of stationary Nash equilibria in identical interest discounted stochastic games. This extends similar convergence results for static games (Monderer & Shapley, 1996a). We then analyze the continuous-time counterpart of our FP procedures, which include as a particular case the best-response dynamic introduced and studied by Leslie et al. (2020) in the context of zero-sum stochastic games. We prove the converge of this dynamics to stationary Nash equilibria in identical-interest and zero-sum discounted stochastic games. Thanks to stochastic approximations, we can infer from the continuous-time convergence some discrete time results such as the convergence to stationary equilibria in zero-sum and team stochastic games (Holler, 2020).' volume: 162 URL: https://proceedings.mlr.press/v162/baudin22a.html PDF: https://proceedings.mlr.press/v162/baudin22a/baudin22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-baudin22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Lucas family: Baudin - given: Rida family: Laraki editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1664-1690 id: baudin22a issued: date-parts: - 2022 - 6 - 28 firstpage: 1664 lastpage: 1690 published: 2022-06-28 00:00:00 +0000 - title: 'Information Discrepancy in Strategic Learning' abstract: 'We initiate the study of the effects of non-transparency in decision rules on individuals’ ability to improve in strategic learning settings. Inspired by real-life settings, such as loan approvals and college admissions, we remove the assumption typically made in the strategic learning literature, that the decision rule is fully known to individuals, and focus instead on settings where it is inaccessible. In their lack of knowledge, individuals try to infer this rule by learning from their peers (e.g., friends and acquaintances who previously applied for a loan), naturally forming groups in the population, each with possibly different type and level of information regarding the decision rule. We show that, in equilibrium, the principal’s decision rule optimizing welfare across sub-populations may cause a strong negative externality: the true quality of some of the groups can actually deteriorate. On the positive side, we show that, in many natural cases, optimal improvement can be guaranteed simultaneously for all sub-populations. We further introduce a measure we term information overlap proxy, and demonstrate its usefulness in characterizing the disparity in improvements across sub-populations. Finally, we identify a natural condition under which improvement can be guaranteed for all sub-populations while maintaining high predictive accuracy. We complement our theoretical analysis with experiments on real-world datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/bechavod22a.html PDF: https://proceedings.mlr.press/v162/bechavod22a/bechavod22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-bechavod22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yahav family: Bechavod - given: Chara family: Podimata - given: Steven family: Wu - given: Juba family: Ziani editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1691-1715 id: bechavod22a issued: date-parts: - 2022 - 6 - 28 firstpage: 1691 lastpage: 1715 published: 2022-06-28 00:00:00 +0000 - title: 'On the Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces' abstract: 'We focus on parameterized policy search for reinforcement learning over continuous action spaces. Typically, one assumes the score function associated with a policy is bounded, which {fails to hold even for Gaussian policies. } To properly address this issue, one must introduce an exploration tolerance parameter to quantify the region in which it is bounded. Doing so incurs a persistent bias that appears in the attenuation rate of the expected policy gradient norm, which is inversely proportional to the radius of the action space. To mitigate this hidden bias, heavy-tailed policy parameterizations may be used, which exhibit a bounded score function, but doing so can cause instability in algorithmic updates. To address these issues, in this work, we study the convergence of policy gradient algorithms under heavy-tailed parameterizations, which we propose to stabilize with a combination of mirror ascent-type updates and gradient tracking. Our main theoretical contribution is the establishment that this scheme converges with constant batch sizes, whereas prior works require these parameters to respectively shrink to null or grow to infinity. Experimentally, this scheme under a heavy-tailed policy parameterization yields improved reward accumulation across a variety of settings as compared with standard benchmarks.' volume: 162 URL: https://proceedings.mlr.press/v162/bedi22a.html PDF: https://proceedings.mlr.press/v162/bedi22a/bedi22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-bedi22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Amrit Singh family: Bedi - given: Souradip family: Chakraborty - given: Anjaly family: Parayil - given: Brian M family: Sadler - given: Pratap family: Tokekar - given: Alec family: Koppel editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1716-1731 id: bedi22a issued: date-parts: - 2022 - 6 - 28 firstpage: 1716 lastpage: 1731 published: 2022-06-28 00:00:00 +0000 - title: 'Imitation Learning by Estimating Expertise of Demonstrators' abstract: 'Many existing imitation learning datasets are collected from multiple demonstrators, each with different expertise at different parts of the environment. Yet, standard imitation learning algorithms typically treat all demonstrators as homogeneous, regardless of their expertise, absorbing the weaknesses of any suboptimal demonstrators. In this work, we show that unsupervised learning over demonstrator expertise can lead to a consistent boost in the performance of imitation learning algorithms. We develop and optimize a joint model over a learned policy and expertise levels of the demonstrators. This enables our model to learn from the optimal behavior and filter out the suboptimal behavior of each demonstrator. Our model learns a single policy that can outperform even the best demonstrator, and can be used to estimate the expertise of any demonstrator at any state. We illustrate our findings on real-robotic continuous control tasks from Robomimic and discrete environments such as MiniGrid and chess, out-performing competing methods in 21 out of 23 settings, with an average of 7% and up to 60% improvement in terms of the final reward.' volume: 162 URL: https://proceedings.mlr.press/v162/beliaev22a.html PDF: https://proceedings.mlr.press/v162/beliaev22a/beliaev22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-beliaev22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mark family: Beliaev - given: Andy family: Shih - given: Stefano family: Ermon - given: Dorsa family: Sadigh - given: Ramtin family: Pedarsani editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1732-1748 id: beliaev22a issued: date-parts: - 2022 - 6 - 28 firstpage: 1732 lastpage: 1748 published: 2022-06-28 00:00:00 +0000 - title: 'Matching Normalizing Flows and Probability Paths on Manifolds' abstract: 'Continuous Normalizing Flows (CNFs) are a class of generative models that transform a prior distribution to a model distribution by solving an ordinary differential equation (ODE). We propose to train CNFs on manifolds by minimizing probability path divergence (PPD), a novel family of divergences between the probability density path generated by the CNF and a target probability density path. PPD is formulated using a logarithmic mass conservation formula which is a linear first order partial differential equation relating the log target probabilities and the CNF’s defining vector field. PPD has several key benefits over existing methods: it sidesteps the need to solve an ODE per iteration, readily applies to manifold data, scales to high dimensions, and is compatible with a large family of target paths interpolating pure noise and data in finite time. Theoretically, PPD is shown to bound classical probability divergences. Empirically, we show that CNFs learned by minimizing PPD achieve state-of-the-art results in likelihoods and sample quality on existing low-dimensional manifold benchmarks, and is the first example of a generative model to scale to moderately high dimensional manifolds.' volume: 162 URL: https://proceedings.mlr.press/v162/ben-hamu22a.html PDF: https://proceedings.mlr.press/v162/ben-hamu22a/ben-hamu22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ben-hamu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Heli family: Ben-Hamu - given: Samuel family: Cohen - given: Joey family: Bose - given: Brandon family: Amos - given: Maximillian family: Nickel - given: Aditya family: Grover - given: Ricky T. Q. family: Chen - given: Yaron family: Lipman editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1749-1763 id: ben-hamu22a issued: date-parts: - 2022 - 6 - 28 firstpage: 1749 lastpage: 1763 published: 2022-06-28 00:00:00 +0000 - title: 'Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models' abstract: 'We consider the regret minimization task in a dueling bandits problem with context information. In every round of the sequential decision problem, the learner makes a context-dependent selection of two choice alternatives (arms) to be compared with each other and receives feedback in the form of noisy preference information. We assume that the feedback process is determined by a linear stochastic transitivity model with contextualized utilities (CoLST), and the learner’s task is to include the best arm (with highest latent context-dependent utility) in the duel. We propose a computationally efficient algorithm, \Algo{CoLSTIM}, which makes its choice based on imitating the feedback process using perturbed context-dependent utility estimates of the underlying CoLST model. If each arm is associated with a $d$-dimensional feature vector, we show that \Algo{CoLSTIM} achieves a regret of order $\tilde O( \sqrt{dT})$ after $T$ learning rounds. Additionally, we also establish the optimality of \Algo{CoLSTIM} by showing a lower bound for the weak regret that refines the existing average regret analysis. Our experiments demonstrate its superiority over state-of-art algorithms for special cases of CoLST models.' volume: 162 URL: https://proceedings.mlr.press/v162/bengs22a.html PDF: https://proceedings.mlr.press/v162/bengs22a/bengs22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-bengs22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Viktor family: Bengs - given: Aadirupa family: Saha - given: Eyke family: Hüllermeier editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1764-1786 id: bengs22a issued: date-parts: - 2022 - 6 - 28 firstpage: 1764 lastpage: 1786 published: 2022-06-28 00:00:00 +0000 - title: 'Neural Inverse Kinematic' abstract: 'Inverse kinematic (IK) methods recover the parameters of the joints, given the desired position of selected elements in the kinematic chain. While the problem is well-defined and low-dimensional, it has to be solved rapidly, accounting for multiple possible solutions. In this work, we propose a neural IK method that employs the hierarchical structure of the problem to sequentially sample valid joint angles conditioned on the desired position and on the preceding joints along the chain. In our solution, a hypernetwork $f$ recovers the parameters of multiple primary networks {$g_1,g_2,…,g_N$, where $N$ is the number of joints}, such that each $g_i$ outputs a distribution of possible joint angles, and is conditioned on the sampled values obtained from the previous primary networks $g_j, j 0$ on the $(1+\varepsilon)^{th}$ central moment of the random variables, namely, for $\varepsilon \in (0,1]$ \[ \mathbb{E}_{X_1 \sim \mathcal{D}} \Big| X_1 - \mu \Big|^{1+\varepsilon} \leq \upsilon_{\varepsilon}. \]{We} provide a lower bound on the minimax error rate for the mean estimation problem under adversarial corruption under this weak assumption, and establish that the proposed M-estimator achieves this lower bound (up to multiplicative constants). When the variance is infinite, the tolerance to contamination of any estimator reduces as $\varepsilon \downarrow 0$. We establish a tight upper bound that characterizes this bargain. To illustrate the usefulness of the derived robust M-estimator in an online setting, we present a bandit algorithm for the partially identifiable best arm identification problem that improves upon the sample complexity of the state of the art algorithms.' volume: 162 URL: https://proceedings.mlr.press/v162/bhatt22a.html PDF: https://proceedings.mlr.press/v162/bhatt22a/bhatt22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-bhatt22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sujay family: Bhatt - given: Guanhua family: Fang - given: Ping family: Li - given: Gennady family: Samorodnitsky editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1906-1924 id: bhatt22a issued: date-parts: - 2022 - 6 - 28 firstpage: 1906 lastpage: 1924 published: 2022-06-28 00:00:00 +0000 - title: 'Nearly Optimal Catoni’s M-estimator for Infinite Variance' abstract: 'In this paper, we extend the remarkable M-estimator of Catoni \citep{Cat12} to situations where the variance is infinite. In particular, given a sequence of i.i.d random variables $\{X_i\}_{i=1}^n$ from distribution $\mathcal{D}$ over $\mathbb{R}$ with mean $\mu$, we only assume the existence of a known upper bound $\upsilon_{\varepsilon} > 0$ on the $(1+\varepsilon)^{th}$ central moment of the random variables, namely, for $\varepsilon \in (0,1]$ \[ \mathbb{E}_{X_1 \sim \mathcal{D}} \Big| X_1 - \mu \Big|^{1+\varepsilon} \leq \upsilon_{\varepsilon}. \]{The} extension is non-trivial owing to the difficulty in characterizing the roots of certain polynomials of degree smaller than $2$. The proposed estimator has the same order of magnitude and the same asymptotic constant as in \citet{Cat12}, but for the case of bounded moments. We further propose a version of the estimator that does not require even the knowledge of $\upsilon_{\varepsilon}$, but adapts the moment bound in a data-driven manner. Finally, to illustrate the usefulness of the derived non-asymptotic confidence bounds, we consider an application in multi-armed bandits and propose best arm identification algorithms, in the fixed confidence setting, that outperform the state of the art.' volume: 162 URL: https://proceedings.mlr.press/v162/bhatt22b.html PDF: https://proceedings.mlr.press/v162/bhatt22b/bhatt22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-bhatt22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sujay family: Bhatt - given: Guanhua family: Fang - given: Ping family: Li - given: Gennady family: Samorodnitsky editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1925-1944 id: bhatt22b issued: date-parts: - 2022 - 6 - 28 firstpage: 1925 lastpage: 1944 published: 2022-06-28 00:00:00 +0000 - title: 'Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning' abstract: 'Large-scale machine learning systems often involve data distributed across a collection of users. Federated learning algorithms leverage this structure by communicating model updates to a central server, rather than entire datasets. In this paper, we study stochastic optimization algorithms for a personalized federated learning setting involving local and global models subject to user-level (joint) differential privacy. While learning a private global model induces a cost of privacy, local learning is perfectly private. We provide generalization guarantees showing that coordinating local learning with private centralized learning yields a generically useful and improved tradeoff between accuracy and privacy. We illustrate our theoretical results with experiments on synthetic and real-world datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/bietti22a.html PDF: https://proceedings.mlr.press/v162/bietti22a/bietti22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-bietti22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Alberto family: Bietti - given: Chen-Yu family: Wei - given: Miroslav family: Dudik - given: John family: Langford - given: Steven family: Wu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1945-1962 id: bietti22a issued: date-parts: - 2022 - 6 - 28 firstpage: 1945 lastpage: 1962 published: 2022-06-28 00:00:00 +0000 - title: 'Non-Vacuous Generalisation Bounds for Shallow Neural Networks' abstract: 'We focus on a specific class of shallow neural networks with a single hidden layer, namely those with $L_2$-normalised data and either a sigmoid-shaped Gaussian error function (“erf”) activation or a Gaussian Error Linear Unit (GELU) activation. For these networks, we derive new generalisation bounds through the PAC-Bayesian theory; unlike most existing such bounds they apply to neural networks with deterministic rather than randomised parameters. Our bounds are empirically non-vacuous when the network is trained with vanilla stochastic gradient descent on MNIST and Fashion-MNIST.' volume: 162 URL: https://proceedings.mlr.press/v162/biggs22a.html PDF: https://proceedings.mlr.press/v162/biggs22a/biggs22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-biggs22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Felix family: Biggs - given: Benjamin family: Guedj editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1963-1981 id: biggs22a issued: date-parts: - 2022 - 6 - 28 firstpage: 1963 lastpage: 1981 published: 2022-06-28 00:00:00 +0000 - title: 'Structure-preserving GANs' abstract: 'Generative adversarial networks (GANs), a class of distribution-learning methods based on a two-player game between a generator and a discriminator, can generally be formulated as a minmax problem based on the variational representation of a divergence between the unknown and the generated distributions. We introduce structure-preserving GANs as a data-efficient framework for learning distributions with additional structure such as group symmetry, by developing new variational representations for divergences. Our theory shows that we can reduce the discriminator space to its projection on the invariant discriminator space, using the conditional expectation with respect to the sigma-algebra associated to the underlying structure. In addition, we prove that the discriminator space reduction must be accompanied by a careful design of structured generators, as flawed designs may easily lead to a catastrophic “mode collapse” of the learned distribution. We contextualize our framework by building symmetry-preserving GANs for distributions with intrinsic group symmetry, and demonstrate that both players, namely the equivariant generator and invariant discriminator, play important but distinct roles in the learning process. Empirical experiments and ablation studies across a broad range of data sets, including real-world medical imaging, validate our theory, and show our proposed methods achieve significantly improved sample fidelity and diversity—almost an order of magnitude measured in Frechet Inception Distance—especially in the small data regime.' volume: 162 URL: https://proceedings.mlr.press/v162/birrell22a.html PDF: https://proceedings.mlr.press/v162/birrell22a/birrell22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-birrell22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jeremiah family: Birrell - given: Markos family: Katsoulakis - given: Luc family: Rey-Bellet - given: Wei family: Zhu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 1982-2020 id: birrell22a issued: date-parts: - 2022 - 6 - 28 firstpage: 1982 lastpage: 2020 published: 2022-06-28 00:00:00 +0000 - title: 'Scalable Spike-and-Slab' abstract: 'Spike-and-slab priors are commonly used for Bayesian variable selection, due to their interpretability and favorable statistical properties. However, existing samplers for spike-and-slab posteriors incur prohibitive computational costs when the number of variables is large. In this article, we propose Scalable Spike-and-Slab (S^3), a scalable Gibbs sampling implementation for high-dimensional Bayesian regression with the continuous spike-and-slab prior of George & McCulloch (1993). For a dataset with n observations and p covariates, S^3 has order max{n^2 p_t, np} computational cost at iteration t where p_t never exceeds the number of covariates switching spike-and-slab states between iterations t and t-1 of the Markov chain. This improves upon the order n^2 p per-iteration cost of state-of-the-art implementations as, typically, p_t is substantially smaller than p. We apply S^3 on synthetic and real-world datasets, demonstrating orders of magnitude speed-ups over existing exact samplers and significant gains in inferential quality over approximate samplers with comparable cost.' volume: 162 URL: https://proceedings.mlr.press/v162/biswas22a.html PDF: https://proceedings.mlr.press/v162/biswas22a/biswas22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-biswas22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Niloy family: Biswas - given: Lester family: Mackey - given: Xiao-Li family: Meng editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2021-2040 id: biswas22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2021 lastpage: 2040 published: 2022-06-28 00:00:00 +0000 - title: 'Breaking Down Out-of-Distribution Detection: Many Methods Based on OOD Training Data Estimate a Combination of the Same Core Quantities' abstract: 'It is an important problem in trustworthy machine learning to recognize out-of-distribution (OOD) inputs which are inputs unrelated to the in-distribution task. Many out-of-distribution detection methods have been suggested in recent years. The goal of this paper is to recognize common objectives as well as to identify the implicit scoring functions of different OOD detection methods. We focus on the sub-class of methods that use surrogate OOD data during training in order to learn an OOD detection score that generalizes to new unseen out-distributions at test time. We show that binary discrimination between in- and (different) out-distributions is equivalent to several distinct formulations of the OOD detection problem. When trained in a shared fashion with a standard classifier, this binary discriminator reaches an OOD detection performance similar to that of Outlier Exposure. Moreover, we show that the confidence loss which is used by Outlier Exposure has an implicit scoring function which differs in a non-trivial fashion from the theoretically optimal scoring function in the case where training and test out-distribution are the same, which again is similar to the one used when training an Energy-Based OOD detector or when adding a background class. In practice, when trained in exactly the same way, all these methods perform similarly.' volume: 162 URL: https://proceedings.mlr.press/v162/bitterwolf22a.html PDF: https://proceedings.mlr.press/v162/bitterwolf22a/bitterwolf22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-bitterwolf22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Julian family: Bitterwolf - given: Alexander family: Meinke - given: Maximilian family: Augustin - given: Matthias family: Hein editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2041-2074 id: bitterwolf22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2041 lastpage: 2074 published: 2022-06-28 00:00:00 +0000 - title: 'A query-optimal algorithm for finding counterfactuals' abstract: 'We design an algorithm for finding counterfactuals with strong theoretical guarantees on its performance. For any monotone model $f : X^d \to \{0,1\}$ and instance $x^\star$, our algorithm makes \[{S}(f)^{O(\Delta_f(x^\star))}\cdot \log d\]{queries} to $f$ and returns an {\sl optimal} counterfactual for $x^\star$: a nearest instance $x’$ to $x^\star$ for which $f(x’)\ne f(x^\star)$. Here $S(f)$ is the sensitivity of $f$, a discrete analogue of the Lipschitz constant, and $\Delta_f(x^\star)$ is the distance from $x^\star$ to its nearest counterfactuals. The previous best known query complexity was $d^{\,O(\Delta_f(x^\star))}$, achievable by brute-force local search. We further prove a lower bound of $S(f)^{\Omega(\Delta_f(x^\star))} + \Omega(\log d)$ on the query complexity of any algorithm, thereby showing that the guarantees of our algorithm are essentially optimal.' volume: 162 URL: https://proceedings.mlr.press/v162/blanc22a.html PDF: https://proceedings.mlr.press/v162/blanc22a/blanc22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-blanc22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Guy family: Blanc - given: Caleb family: Koch - given: Jane family: Lange - given: Li-Yang family: Tan editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2075-2090 id: blanc22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2075 lastpage: 2090 published: 2022-06-28 00:00:00 +0000 - title: 'Popular decision tree algorithms are provably noise tolerant' abstract: 'Using the framework of boosting, we prove that all impurity-based decision tree learning algorithms, including the classic ID3, C4.5, and CART, are highly noise tolerant. Our guarantees hold under the strongest noise model of nasty noise, and we provide near-matching upper and lower bounds on the allowable noise rate. We further show that these algorithms, which are simple and have long been central to everyday machine learning, enjoy provable guarantees in the noisy setting that are unmatched by existing algorithms in the theoretical literature on decision tree learning. Taken together, our results add to an ongoing line of research that seeks to place the empirical success of these practical decision tree algorithms on firm theoretical footing.' volume: 162 URL: https://proceedings.mlr.press/v162/blanc22b.html PDF: https://proceedings.mlr.press/v162/blanc22b/blanc22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-blanc22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Guy family: Blanc - given: Jane family: Lange - given: Ali family: Malik - given: Li-Yang family: Tan editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2091-2106 id: blanc22b issued: date-parts: - 2022 - 6 - 28 firstpage: 2091 lastpage: 2106 published: 2022-06-28 00:00:00 +0000 - title: 'Optimizing Sequential Experimental Design with Deep Reinforcement Learning' abstract: 'Bayesian approaches developed to solve the optimal design of sequential experiments are mathematically elegant but computationally challenging. Recently, techniques using amortization have been proposed to make these Bayesian approaches practical, by training a parameterized policy that proposes designs efficiently at deployment time. However, these methods may not sufficiently explore the design space, require access to a differentiable probabilistic model and can only optimize over continuous design spaces. Here, we address these limitations by showing that the problem of optimizing policies can be reduced to solving a Markov decision process (MDP). We solve the equivalent MDP with modern deep reinforcement learning techniques. Our experiments show that our approach is also computationally efficient at deployment time and exhibits state-of-the-art performance on both continuous and discrete design spaces, even when the probabilistic model is a black box.' volume: 162 URL: https://proceedings.mlr.press/v162/blau22a.html PDF: https://proceedings.mlr.press/v162/blau22a/blau22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-blau22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tom family: Blau - given: Edwin V. family: Bonilla - given: Iadine family: Chades - given: Amir family: Dezfouli editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2107-2128 id: blau22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2107 lastpage: 2128 published: 2022-06-28 00:00:00 +0000 - title: 'Lagrangian Method for Q-Function Learning (with Applications to Machine Translation)' abstract: 'This paper discusses a new approach to the fundamental problem of learning optimal Q-functions. In this approach, optimal Q-functions are formulated as saddle points of a nonlinear Lagrangian function derived from the classic Bellman optimality equation. The paper shows that the Lagrangian enjoys strong duality, in spite of its nonlinearity, which paves the way to a general Lagrangian method to Q-function learning. As a demonstration, the paper develops an imitation learning algorithm based on the duality theory, and applies the algorithm to a state-of-the-art machine translation benchmark. The paper then turns to demonstrate a symmetry breaking phenomenon regarding the optimality of the Lagrangian saddle points, which justifies a largely overlooked direction in developing the Lagrangian method.' volume: 162 URL: https://proceedings.mlr.press/v162/bojun22a.html PDF: https://proceedings.mlr.press/v162/bojun22a/bojun22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-bojun22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Huang family: Bojun editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2129-2159 id: bojun22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2129 lastpage: 2159 published: 2022-06-28 00:00:00 +0000 - title: 'Generalized Results for the Existence and Consistency of the MLE in the Bradley-Terry-Luce Model' abstract: 'Ranking problems based on pairwise comparisons, such as those arising in online gaming, often involve a large pool of items to order. In these situations, the gap in performance between any two items can be significant, and the smallest and largest winning probabilities can be very close to zero or one. Furthermore, each item may be compared only to a subset of all the items, so that not all pairwise comparisons are observed. In this paper, we study the performance of the Bradley-Terry-Luce model for ranking from pairwise comparison data under more realistic settings than those considered in the literature so far. In particular, we allow for near-degenerate winning probabilities and arbitrary comparison designs. We obtain novel results about the existence of the maximum likelihood estimator (MLE) and the corresponding $\ell_2$ estimation error without the bounded winning probability assumption commonly used in the literature and for arbitrary comparison graph topologies. Central to our approach is the reliance on the Fisher information matrix to express the dependence on the graph topologies and the impact of the values of the winning probabilities on the estimation risk and on the conditions for the existence of the MLE. Our bounds recover existing results as special cases but are more broadly applicable.' volume: 162 URL: https://proceedings.mlr.press/v162/bong22a.html PDF: https://proceedings.mlr.press/v162/bong22a/bong22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-bong22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Heejong family: Bong - given: Alessandro family: Rinaldo editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2160-2177 id: bong22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2160 lastpage: 2177 published: 2022-06-28 00:00:00 +0000 - title: 'How to Train Your Wide Neural Network Without Backprop: An Input-Weight Alignment Perspective' abstract: 'Recent works have examined theoretical and empirical properties of wide neural networks trained in the Neural Tangent Kernel (NTK) regime. Given that biological neural networks are much wider than their artificial counterparts, we consider NTK regime wide neural networks as a possible model of biological neural networks. Leveraging NTK theory, we show theoretically that gradient descent drives layerwise weight updates that are aligned with their input activity correlations weighted by error, and demonstrate empirically that the result also holds in finite-width wide networks. The alignment result allows us to formulate a family of biologically-motivated, backpropagation-free learning rules that are theoretically equivalent to backpropagation in infinite-width networks. We test these learning rules on benchmark problems in feedforward and recurrent neural networks and demonstrate, in wide networks, comparable performance to backpropagation. The proposed rules are particularly effective in low data regimes, which are common in biological learning settings.' volume: 162 URL: https://proceedings.mlr.press/v162/boopathy22a.html PDF: https://proceedings.mlr.press/v162/boopathy22a/boopathy22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-boopathy22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Akhilan family: Boopathy - given: Ila family: Fiete editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2178-2205 id: boopathy22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2178 lastpage: 2205 published: 2022-06-28 00:00:00 +0000 - title: 'Improving Language Models by Retrieving from Trillions of Tokens' abstract: 'We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a 2 trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25{\texttimes} fewer parameters. After fine-tuning, RETRO performance translates to downstream knowledge-intensive tasks such as question answering. RETRO combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train RETRO from scratch, yet can also rapidly RETROfit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale.' volume: 162 URL: https://proceedings.mlr.press/v162/borgeaud22a.html PDF: https://proceedings.mlr.press/v162/borgeaud22a/borgeaud22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-borgeaud22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sebastian family: Borgeaud - given: Arthur family: Mensch - given: Jordan family: Hoffmann - given: Trevor family: Cai - given: Eliza family: Rutherford - given: Katie family: Millican - given: George Bm family: Van Den Driessche - given: Jean-Baptiste family: Lespiau - given: Bogdan family: Damoc - given: Aidan family: Clark - given: Diego family: De Las Casas - given: Aurelia family: Guy - given: Jacob family: Menick - given: Roman family: Ring - given: Tom family: Hennigan - given: Saffron family: Huang - given: Loren family: Maggiore - given: Chris family: Jones - given: Albin family: Cassirer - given: Andy family: Brock - given: Michela family: Paganini - given: Geoffrey family: Irving - given: Oriol family: Vinyals - given: Simon family: Osindero - given: Karen family: Simonyan - given: Jack family: Rae - given: Erich family: Elsen - given: Laurent family: Sifre editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2206-2240 id: borgeaud22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2206 lastpage: 2240 published: 2022-06-28 00:00:00 +0000 - title: 'Lie Point Symmetry Data Augmentation for Neural PDE Solvers' abstract: 'Neural networks are increasingly being used to solve partial differential equations (PDEs), replacing slower numerical solvers. However, a critical issue is that neural PDE solvers require high-quality ground truth data, which usually must come from the very solvers they are designed to replace. Thus, we are presented with a proverbial chicken-and-egg problem. In this paper, we present a method, which can partially alleviate this problem, by improving neural PDE solver sample complexity—Lie point symmetry data augmentation (LPSDA). In the context of PDEs, it turns out we are able to quantitatively derive an exhaustive list of data transformations, based on the Lie point symmetry group of the PDEs in question, something not possible in other application areas. We present this framework and demonstrate how it can easily be deployed to improve neural PDE solver sample complexity by an order of magnitude.' volume: 162 URL: https://proceedings.mlr.press/v162/brandstetter22a.html PDF: https://proceedings.mlr.press/v162/brandstetter22a/brandstetter22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-brandstetter22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Johannes family: Brandstetter - given: Max family: Welling - given: Daniel E family: Worrall editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2241-2256 id: brandstetter22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2241 lastpage: 2256 published: 2022-06-28 00:00:00 +0000 - title: 'An iterative clustering algorithm for the Contextual Stochastic Block Model with optimality guarantees' abstract: 'Real-world networks often come with side information that can help to improve the performance of network analysis tasks such as clustering. Despite a large number of empirical and theoretical studies conducted on network clustering methods during the past decade, the added value of side information and the methods used to incorporate it optimally in clustering algorithms are relatively less understood. We propose a new iterative algorithm to cluster networks with side information for nodes (in the form of covariates) and show that our algorithm is optimal under the Contextual Symmetric Stochastic Block Model. Our algorithm can be applied to general Contextual Stochastic Block Models and avoids hyperparameter tuning in contrast to previously proposed methods. We confirm our theoretical results on synthetic data experiments where our algorithm significantly outperforms other methods, and show that it can also be applied to signed graphs. Finally we demonstrate the practical interest of our method on real data.' volume: 162 URL: https://proceedings.mlr.press/v162/braun22a.html PDF: https://proceedings.mlr.press/v162/braun22a/braun22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-braun22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Guillaume family: Braun - given: Hemant family: Tyagi - given: Christophe family: Biernacki editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2257-2291 id: braun22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2257 lastpage: 2291 published: 2022-06-28 00:00:00 +0000 - title: 'Tractable Dendritic RNNs for Reconstructing Nonlinear Dynamical Systems' abstract: 'In many scientific disciplines, we are interested in inferring the nonlinear dynamical system underlying a set of observed time series, a challenging task in the face of chaotic behavior and noise. Previous deep learning approaches toward this goal often suffered from a lack of interpretability and tractability. In particular, the high-dimensional latent spaces often required for a faithful embedding, even when the underlying dynamics lives on a lower-dimensional manifold, can hamper theoretical analysis. Motivated by the emerging principles of dendritic computation, we augment a dynamically interpretable and mathematically tractable piecewise-linear (PL) recurrent neural network (RNN) by a linear spline basis expansion. We show that this approach retains all the theoretically appealing properties of the simple PLRNN, yet boosts its capacity for approximating arbitrary nonlinear dynamical systems in comparatively low dimensions. We employ two frameworks for training the system, one combining BPTT with teacher forcing, and another based on fast and scalable variational inference. We show that the dendritically expanded PLRNN achieves better reconstructions with fewer parameters and dimensions on various dynamical systems benchmarks and compares favorably to other methods, while retaining a tractable and interpretable structure.' volume: 162 URL: https://proceedings.mlr.press/v162/brenner22a.html PDF: https://proceedings.mlr.press/v162/brenner22a/brenner22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-brenner22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Manuel family: Brenner - given: Florian family: Hess - given: Jonas M family: Mikhaeil - given: Leonard F family: Bereska - given: Zahra family: Monfared - given: Po-Chen family: Kuo - given: Daniel family: Durstewitz editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2292-2320 id: brenner22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2292 lastpage: 2320 published: 2022-06-28 00:00:00 +0000 - title: 'Learning to Predict Graphs with Fused Gromov-Wasserstein Barycenters' abstract: 'This paper introduces a novel and generic framework to solve the flagship task of supervised labeled graph prediction by leveraging Optimal Transport tools. We formulate the problem as regression with the Fused Gromov-Wasserstein (FGW) loss and propose a predictive model relying on a FGW barycenter whose weights depend on inputs. First we introduce a non-parametric estimator based on kernel ridge regression for which theoretical results such as consistency and excess risk bound are proved. Next we propose an interpretable parametric model where the barycenter weights are modeled with a neural network and the graphs on which the FGW barycenter is calculated are additionally learned. Numerical experiments show the strength of the method and its ability to interpolate in the labeled graph space on simulated data and on a difficult metabolic identification problem where it can reach very good performance with very little engineering.' volume: 162 URL: https://proceedings.mlr.press/v162/brogat-motte22a.html PDF: https://proceedings.mlr.press/v162/brogat-motte22a/brogat-motte22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-brogat-motte22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Luc family: Brogat-Motte - given: Rémi family: Flamary - given: Celine family: Brouard - given: Juho family: Rousu - given: Florence family: D’Alché-Buc editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2321-2335 id: brogat-motte22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2321 lastpage: 2335 published: 2022-06-28 00:00:00 +0000 - title: 'Efficient Learning of CNNs using Patch Based Features' abstract: 'Recent work has demonstrated the effectiveness of using patch based representations when learning from image data. Here we provide theoretical support for this observation, by showing that a simple semi-supervised algorithm that uses patch statistics can efficiently learn labels produced by a one-hidden-layer Convolutional Neural Network (CNN). Since CNNs are known to be computationally hard to learn in the worst case, our analysis holds under some distributional assumptions. We show that these assumptions are necessary and sufficient for our results to hold. We verify that the distributional assumptions hold on real-world data by experimenting on the CIFAR-10 dataset, and find that the analyzed algorithm outperforms a vanilla one-hidden-layer CNN. Finally, we demonstrate that by running the algorithm in a layer-by-layer fashion we can build a deep model which gives further improvements, hinting that this method provides insights about the behavior of deep CNNs.' volume: 162 URL: https://proceedings.mlr.press/v162/brutzkus22a.html PDF: https://proceedings.mlr.press/v162/brutzkus22a/brutzkus22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-brutzkus22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Alon family: Brutzkus - given: Amir family: Globerson - given: Eran family: Malach - given: Alon Regev family: Netser - given: Shai family: Shalev-Schwartz editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2336-2356 id: brutzkus22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2336 lastpage: 2356 published: 2022-06-28 00:00:00 +0000 - title: 'Causal structure-based root cause analysis of outliers' abstract: 'Current techniques for explaining outliers cannot tell what caused the outliers. We present a formal method to identify "root causes" of outliers, amongst variables. The method requires a causal graph of the variables along with the functional causal model. It quantifies the contribution of each variable to the target outlier score, which explains to what extent each variable is a "root cause" of the target outlier. We study the empirical performance of the method through simulations and present a real-world case study identifying "root causes" of extreme river flows.' volume: 162 URL: https://proceedings.mlr.press/v162/budhathoki22a.html PDF: https://proceedings.mlr.press/v162/budhathoki22a/budhathoki22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-budhathoki22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Kailash family: Budhathoki - given: Lenon family: Minorics - given: Patrick family: Bloebaum - given: Dominik family: Janzing editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2357-2369 id: budhathoki22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2357 lastpage: 2369 published: 2022-06-28 00:00:00 +0000 - title: 'IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages' abstract: 'Reliable evaluation benchmarks designed for replicability and comprehensiveness have driven progress in machine learning. Due to the lack of a multilingual benchmark, however, vision-and-language research has mostly focused on English language tasks. To fill this gap, we introduce the Image-Grounded Language Understanding Evaluation benchmark. IGLUE brings together{—}by both aggregating pre-existing datasets and creating new ones{—}visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages. Our benchmark enables the evaluation of multilingual multimodal models for transfer learning, not only in a zero-shot setting, but also in newly defined few-shot learning setups. Based on the evaluation of the available state-of-the-art models, we find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks. Moreover, downstream performance is partially explained by the amount of available unlabelled textual data for pretraining, and only weakly by the typological distance of target{–}source languages. We hope to encourage future research efforts in this area by releasing the benchmark to the community.' volume: 162 URL: https://proceedings.mlr.press/v162/bugliarello22a.html PDF: https://proceedings.mlr.press/v162/bugliarello22a/bugliarello22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-bugliarello22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Emanuele family: Bugliarello - given: Fangyu family: Liu - given: Jonas family: Pfeiffer - given: Siva family: Reddy - given: Desmond family: Elliott - given: Edoardo Maria family: Ponti - given: Ivan family: Vulić editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2370-2392 id: bugliarello22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2370 lastpage: 2392 published: 2022-06-28 00:00:00 +0000 - title: 'Interactive Inverse Reinforcement Learning for Cooperative Games' abstract: 'We study the problem of designing autonomous agents that can learn to cooperate effectively with a potentially suboptimal partner while having no access to the joint reward function. This problem is modeled as a cooperative episodic two-agent Markov decision process. We assume control over only the first of the two agents in a Stackelberg formulation of the game, where the second agent is acting so as to maximise expected utility given the first agent’s policy. How should the first agent act in order to learn the joint reward function as quickly as possible and so that the joint policy is as close to optimal as possible? We analyse how knowledge about the reward function can be gained in this interactive two-agent scenario. We show that when the learning agent’s policies have a significant effect on the transition function, the reward function can be learned efficiently.' volume: 162 URL: https://proceedings.mlr.press/v162/buning22a.html PDF: https://proceedings.mlr.press/v162/buning22a/buning22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-buning22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Thomas Kleine family: Büning - given: Anne-Marie family: George - given: Christos family: Dimitrakakis editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2393-2413 id: buning22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2393 lastpage: 2413 published: 2022-06-28 00:00:00 +0000 - title: 'Convolutional and Residual Networks Provably Contain Lottery Tickets' abstract: 'The Lottery Ticket Hypothesis continues to have a profound practical impact on the quest for small scale deep neural networks that solve modern deep learning tasks at competitive performance. These lottery tickets are identified by pruning large randomly initialized neural networks with architectures that are as diverse as their applications. Yet, theoretical insights that attest their existence have been mostly focused on deed fully-connected feed forward networks with ReLU activation functions. We prove that also modern architectures consisting of convolutional and residual layers that can be equipped with almost arbitrary activation functions can contain lottery tickets with high probability.' volume: 162 URL: https://proceedings.mlr.press/v162/burkholz22a.html PDF: https://proceedings.mlr.press/v162/burkholz22a/burkholz22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-burkholz22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Rebekka family: Burkholz editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2414-2433 id: burkholz22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2414 lastpage: 2433 published: 2022-06-28 00:00:00 +0000 - title: 'Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal Stochastic Shortest Path' abstract: 'We revisit the incremental autonomous exploration problem proposed by Lim and Auer (2012). In this setting, the agent aims to learn a set of near-optimal goal-conditioned policies to reach the $L$-controllable states: states that are incrementally reachable from an initial state $s_0$ within $L$ steps in expectation. We introduce a new algorithm with stronger sample complexity bounds than existing ones. Furthermore, we also prove the first lower bound for the autonomous exploration problem. In particular, the lower bound implies that our proposed algorithm, Value-Aware Autonomous Exploration, is nearly minimax-optimal when the number of $L$-controllable states grows polynomially with respect to $L$. Key in our algorithm design is a connection between autonomous exploration and multi-goal stochastic shortest path, a new problem that naturally generalizes the classical stochastic shortest path problem. This new problem and its connection to autonomous exploration can be of independent interest.' volume: 162 URL: https://proceedings.mlr.press/v162/cai22a.html PDF: https://proceedings.mlr.press/v162/cai22a/cai22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-cai22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Haoyuan family: Cai - given: Tengyu family: Ma - given: Simon family: Du editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2434-2456 id: cai22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2434 lastpage: 2456 published: 2022-06-28 00:00:00 +0000 - title: 'Convergence of Invariant Graph Networks' abstract: 'Although theoretical properties such as expressive power and over-smoothing of graph neural networks (GNN) have been extensively studied recently, its convergence property is a relatively new direction. In this paper, we investigate the convergence of one powerful GNN, Invariant Graph Network (IGN) over graphs sampled from graphons. We first prove the stability of linear layers for general $k$-IGN (of order $k$) based on a novel interpretation of linear equivariant layers. Building upon this result, we prove the convergence of $k$-IGN under the model of \citet{ruiz2020graphon}, where we access the edge weight but the convergence error is measured for graphon inputs. Under the more natural (and more challenging) setting of \citet{keriven2020convergence} where one can only access 0-1 adjacency matrix sampled according to edge probability, we first show a negative result that the convergence of any IGN is not possible. We then obtain the convergence of a subset of IGNs, denoted as IGN-small, after the edge probability estimation. We show that IGN-small still contains function class rich enough that can approximate spectral GNNs arbitrarily well. Lastly, we perform experiments on various graphon models to verify our statements.' volume: 162 URL: https://proceedings.mlr.press/v162/cai22b.html PDF: https://proceedings.mlr.press/v162/cai22b/cai22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-cai22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Chen family: Cai - given: Yusu family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2457-2484 id: cai22b issued: date-parts: - 2022 - 6 - 28 firstpage: 2457 lastpage: 2484 published: 2022-06-28 00:00:00 +0000 - title: 'Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency' abstract: 'We study reinforcement learning for partially observed Markov decision processes (POMDPs) with infinite observation and state spaces, which remains less investigated theoretically. To this end, we make the first attempt at bridging partial observability and function approximation for a class of POMDPs with a linear structure. In detail, we propose a reinforcement learning algorithm (Optimistic Exploration via Adversarial Integral Equation or OP-TENET) that attains an $\epsilon$-optimal policy within $O(1/\epsilon^2)$ episodes. In particular, the sample complexity scales polynomially in the intrinsic dimension of the linear structure and is independent of the size of the observation and state spaces. The sample efficiency of OP-TENET is enabled by a sequence of ingredients: (i) a Bellman operator with finite memory, which represents the value function in a recursive manner, (ii) the identification and estimation of such an operator via an adversarial integral equation, which features a smoothed discriminator tailored to the linear structure, and (iii) the exploration of the observation and state spaces via optimism, which is based on quantifying the uncertainty in the adversarial integral equation.' volume: 162 URL: https://proceedings.mlr.press/v162/cai22c.html PDF: https://proceedings.mlr.press/v162/cai22c/cai22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-cai22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Qi family: Cai - given: Zhuoran family: Yang - given: Zhaoran family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2485-2522 id: cai22c issued: date-parts: - 2022 - 6 - 28 firstpage: 2485 lastpage: 2522 published: 2022-06-28 00:00:00 +0000 - title: 'Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times' abstract: 'Computing a Gaussian process (GP) posterior has a computational cost cubical in the number of historical points. A reformulation of the same GP posterior highlights that this complexity mainly depends on how many unique historical points are considered. This can have important implication in active learning settings, where the set of historical points is constructed sequentially by the learner. We show that sequential black-box optimization based on GPs (GP-Opt) can be made efficient by sticking to a candidate solution for multiple evaluation steps and switch only when necessary. Limiting the number of switches also limits the number of unique points in the history of the GP. Thus, the efficient GP reformulation can be used to exactly and cheaply compute the posteriors required to run the GP-Opt algorithms. This approach is especially useful in real-world applications of GP-Opt with high switch costs (e.g. switching chemicals in wet labs, data/model loading in hyperparameter optimization). As examples of this meta-approach, we modify two well-established GP-Opt algorithms, GP-UCB and GP-EI, to switch candidates as infrequently as possible adapting rules from batched GP-Opt. These versions preserve all the theoretical no-regret guarantees while improving practical aspects of the algorithms such as runtime, memory complexity, and the ability of batching candidates and evaluating them in parallel.' volume: 162 URL: https://proceedings.mlr.press/v162/calandriello22a.html PDF: https://proceedings.mlr.press/v162/calandriello22a/calandriello22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-calandriello22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Daniele family: Calandriello - given: Luigi family: Carratino - given: Alessandro family: Lazaric - given: Michal family: Valko - given: Lorenzo family: Rosasco editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2523-2541 id: calandriello22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2523 lastpage: 2541 published: 2022-06-28 00:00:00 +0000 - title: 'Adaptive Gaussian Process Change Point Detection' abstract: 'Detecting change points in time series, i.e., points in time at which some observed process suddenly changes, is a fundamental task that arises in many real-world applications, with consequences for safety and reliability. In this work, we propose ADAGA, a novel Gaussian process-based solution to this problem, that leverages a powerful heuristics we developed based on statistical hypothesis testing. In contrast to prior approaches, ADAGA adapts to changes both in mean and covariance structure of the temporal process. In extensive experiments, we show its versatility and applicability to different classes of change points, demonstrating that it is significantly more accurate than current state-of-the-art alternatives.' volume: 162 URL: https://proceedings.mlr.press/v162/caldarelli22a.html PDF: https://proceedings.mlr.press/v162/caldarelli22a/caldarelli22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-caldarelli22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Edoardo family: Caldarelli - given: Philippe family: Wenk - given: Stefan family: Bauer - given: Andreas family: Krause editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2542-2571 id: caldarelli22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2542 lastpage: 2571 published: 2022-06-28 00:00:00 +0000 - title: 'Measuring dissimilarity with diffeomorphism invariance' abstract: 'Measures of similarity (or dissimilarity) are a key ingredient to many machine learning algorithms. We introduce DID, a pairwise dissimilarity measure applicable to a wide range of data spaces, which leverages the data’s internal structure to be invariant to diffeomorphisms. We prove that DID enjoys properties which make it relevant for theoretical study and practical use. By representing each datum as a function, DID is defined as the solution to an optimization problem in a Reproducing Kernel Hilbert Space and can be expressed in closed-form. In practice, it can be efficiently approximated via Nystr{ö}m sampling. Empirical experiments support the merits of DID.' volume: 162 URL: https://proceedings.mlr.press/v162/cantelobre22a.html PDF: https://proceedings.mlr.press/v162/cantelobre22a/cantelobre22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-cantelobre22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Théophile family: Cantelobre - given: Carlo family: Ciliberto - given: Benjamin family: Guedj - given: Alessandro family: Rudi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2572-2596 id: cantelobre22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2572 lastpage: 2596 published: 2022-06-28 00:00:00 +0000 - title: 'A Model-Agnostic Randomized Learning Framework based on Random Hypothesis Subspace Sampling' abstract: 'We propose a model-agnostic randomized learning framework based on Random Hypothesis Subspace Sampling (RHSS). Given any hypothesis class, it randomly samples $k$ hypotheses and learns a near-optimal model from their span by simply solving a linear least square problem in $O(n k^2)$ time, where $n$ is the number of training instances. On the theory side, we derive the performance guarantee of RHSS from a generic subspace approximation perspective, leveraging properties of metric entropy and random matrices. On the practical side, we apply the RHSS framework to learn kernel, network and tree based models. Experimental results show they converge efficiently as $k$ increases and outperform their model-specific counterparts including random Fourier feature, random vector functional link and extra tree on real-world data sets.' volume: 162 URL: https://proceedings.mlr.press/v162/cao22a.html PDF: https://proceedings.mlr.press/v162/cao22a/cao22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-cao22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yiting family: Cao - given: Chao family: Lan editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2597-2608 id: cao22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2597 lastpage: 2608 published: 2022-06-28 00:00:00 +0000 - title: 'Gaussian Process Uniform Error Bounds with Unknown Hyperparameters for Safety-Critical Applications' abstract: 'Gaussian processes have become a promising tool for various safety-critical settings, since the posterior variance can be used to directly estimate the model error and quantify risk. However, state-of-the-art techniques for safety-critical settings hinge on the assumption that the kernel hyperparameters are known, which does not apply in general. To mitigate this, we introduce robust Gaussian process uniform error bounds in settings with unknown hyperparameters. Our approach computes a confidence region in the space of hyperparameters, which enables us to obtain a probabilistic upper bound for the model error of a Gaussian process with arbitrary hyperparameters. We do not require to know any bounds for the hyperparameters a priori, which is an assumption commonly found in related work. Instead, we are able to derive bounds from data in an intuitive fashion. We additionally employ the proposed technique to derive performance guarantees for a class of learning-based control problems. Experiments show that the bound performs significantly better than vanilla and fully Bayesian Gaussian processes.' volume: 162 URL: https://proceedings.mlr.press/v162/capone22a.html PDF: https://proceedings.mlr.press/v162/capone22a/capone22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-capone22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Alexandre family: Capone - given: Armin family: Lederer - given: Sandra family: Hirche editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2609-2624 id: capone22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2609 lastpage: 2624 published: 2022-06-28 00:00:00 +0000 - title: 'Burst-Dependent Plasticity and Dendritic Amplification Support Target-Based Learning and Hierarchical Imitation Learning' abstract: 'The brain can learn to solve a wide range of tasks with high temporal and energetic efficiency. However, most biological models are composed of simple single-compartment neurons and cannot achieve the state-of-the-art performances of artificial intelligence. We propose a multi-compartment model of pyramidal neuron, in which bursts and dendritic input segregation give the possibility to plausibly support a biological target-based learning. In target-based learning, the internal solution of a problem (a spatio-temporal pattern of bursts in our case) is suggested to the network, bypassing the problems of error backpropagation and credit assignment. Finally, we show that this neuronal architecture naturally supports the orchestration of “hierarchical imitation learning”, enabling the decomposition of challenging long-horizon decision-making tasks into simpler subtasks.' volume: 162 URL: https://proceedings.mlr.press/v162/capone22b.html PDF: https://proceedings.mlr.press/v162/capone22b/capone22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-capone22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Cristiano family: Capone - given: Cosimo family: Lupo - given: Paolo family: Muratore - given: Pier Stanislao family: Paolucci editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2625-2637 id: capone22b issued: date-parts: - 2022 - 6 - 28 firstpage: 2625 lastpage: 2637 published: 2022-06-28 00:00:00 +0000 - title: 'A Marriage between Adversarial Team Games and 2-player Games: Enabling Abstractions, No-regret Learning, and Subgame Solving' abstract: 'Ex ante correlation is becoming the mainstream approach for sequential adversarial team games, where a team of players faces another team in a zero-sum game. It is known that team members’ asymmetric information makes both equilibrium computation \textsf{APX}-hard and team’s strategies not directly representable on the game tree. This latter issue prevents the adoption of successful tools for huge 2-player zero-sum games such as, e.g., abstractions, no-regret learning, and subgame solving. This work shows that we can recover from this weakness by bridging the gap between sequential adversarial team games and 2-player games. In particular, we propose a new, suitable game representation that we call team-public-information, in which a team is represented as a single coordinator who only knows information common to the whole team and prescribes to each member an action for any possible private state. The resulting representation is highly explainable, being a 2-player tree in which the team’s strategies are behavioral with a direct interpretation and more expressive than the original extensive form when designing abstractions. Furthermore, we prove payoff equivalence of our representation, and we provide techniques that, starting directly from the extensive form, generate dramatically more compact representations without information loss. Finally, we experimentally evaluate our techniques when applied to a standard testbed, comparing their performance with the current state of the art.' volume: 162 URL: https://proceedings.mlr.press/v162/carminati22a.html PDF: https://proceedings.mlr.press/v162/carminati22a/carminati22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-carminati22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Luca family: Carminati - given: Federico family: Cacciamani - given: Marco family: Ciccone - given: Nicola family: Gatti editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2638-2657 id: carminati22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2638 lastpage: 2657 published: 2022-06-28 00:00:00 +0000 - title: 'RECAPP: Crafting a More Efficient Catalyst for Convex Optimization' abstract: 'The accelerated proximal point method (APPA), also known as "Catalyst", is a well-established reduction from convex optimization to approximate proximal point computation (i.e., regularized minimization). This reduction is conceptually elegant and yields strong convergence rate guarantees. However, these rates feature an extraneous logarithmic term arising from the need to compute each proximal point to high accuracy. In this work, we propose a novel Relaxed Error Criterion for Accelerated Proximal Point (RECAPP) that eliminates the need for high accuracy subproblem solutions. We apply RECAPP to two canonical problems: finite-sum and max-structured minimization. For finite-sum problems, we match the best known complexity, previously obtained by carefully-designed problem-specific algorithms. For minimizing max_y f(x,y) where f is convex in x and strongly-concave in y, we improve on the best known (Catalyst-based) bound by a logarithmic factor.' volume: 162 URL: https://proceedings.mlr.press/v162/carmon22a.html PDF: https://proceedings.mlr.press/v162/carmon22a/carmon22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-carmon22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yair family: Carmon - given: Arun family: Jambulapati - given: Yujia family: Jin - given: Aaron family: Sidford editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2658-2685 id: carmon22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2658 lastpage: 2685 published: 2022-06-28 00:00:00 +0000 - title: 'Estimating and Penalizing Induced Preference Shifts in Recommender Systems' abstract: 'The content that a recommender system (RS) shows to users influences them. Therefore, when choosing a recommender to deploy, one is implicitly also choosing to induce specific internal states in users. Even more, systems trained via long-horizon optimization will have direct incentives to manipulate users, e.g. shift their preferences so they are easier to satisfy. We focus on induced preference shifts in users. We argue that {–} before deployment {–} system designers should: estimate the shifts a recommender would induce; evaluate whether such shifts would be undesirable; and perhaps even actively optimize to avoid problematic shifts. These steps involve two challenging ingredients: estimation requires anticipating how hypothetical policies would influence user preferences if deployed {–} we do this by using historical user interaction data to train a predictive user model which implicitly contains their preference dynamics; evaluation and optimization additionally require metrics to assess whether such influences are manipulative or otherwise unwanted {–} we use the notion of "safe shifts", that define a trust region within which behavior is safe: for instance, the natural way in which users would shift without interference from the system could be deemed "safe". In simulated experiments, we show that our learned preference dynamics model is effective in estimating user preferences and how they would respond to new recommenders. Additionally, we show that recommenders that optimize for staying in the trust region can avoid manipulative behaviors while still generating engagement.' volume: 162 URL: https://proceedings.mlr.press/v162/carroll22a.html PDF: https://proceedings.mlr.press/v162/carroll22a/carroll22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-carroll22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Micah D family: Carroll - given: Anca family: Dragan - given: Stuart family: Russell - given: Dylan family: Hadfield-Menell editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2686-2708 id: carroll22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2686 lastpage: 2708 published: 2022-06-28 00:00:00 +0000 - title: 'YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for Everyone' abstract: 'YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker TTS. Our method builds upon the VITS model and adds several novel modifications for zero-shot multi-speaker and multilingual training. We achieved state-of-the-art (SOTA) results in zero-shot multi-speaker TTS and results comparable to SOTA in zero-shot voice conversion on the VCTK dataset. Additionally, our approach achieves promising results in a target language with a single-speaker dataset, opening possibilities for zero-shot multi-speaker TTS and zero-shot voice conversion systems in low-resource languages. Finally, it is possible to fine-tune the YourTTS model with less than 1 minute of speech and achieve state-of-the-art results in voice similarity and with reasonable quality. This is important to allow synthesis for speakers with a very different voice or recording characteristics from those seen during training.' volume: 162 URL: https://proceedings.mlr.press/v162/casanova22a.html PDF: https://proceedings.mlr.press/v162/casanova22a/casanova22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-casanova22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Edresson family: Casanova - given: Julian family: Weber - given: Christopher D family: Shulby - given: Arnaldo Candido family: Junior - given: Eren family: Gölge - given: Moacir A family: Ponti editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2709-2720 id: casanova22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2709 lastpage: 2720 published: 2022-06-28 00:00:00 +0000 - title: 'The Infinite Contextual Graph Markov Model' abstract: 'The Contextual Graph Markov Model (CGMM) is a deep, unsupervised, and probabilistic model for graphs that is trained incrementally on a layer-by-layer basis. As with most Deep Graph Networks, an inherent limitation is the need to perform an extensive model selection to choose the proper size of each layer’s latent representation. In this paper, we address this problem by introducing the Infinite Contextual Graph Markov Model (iCGMM), the first deep Bayesian nonparametric model for graph learning. During training, iCGMM can adapt the complexity of each layer to better fit the underlying data distribution. On 8 graph classification tasks, we show that iCGMM: i) successfully recovers or improves CGMM’s performances while reducing the hyper-parameters’ search space; ii) performs comparably to most end-to-end supervised methods. The results include studies on the importance of depth, hyper-parameters, and compression of the graph embeddings. We also introduce a novel approximated inference procedure that better deals with larger graph topologies.' volume: 162 URL: https://proceedings.mlr.press/v162/castellana22a.html PDF: https://proceedings.mlr.press/v162/castellana22a/castellana22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-castellana22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Daniele family: Castellana - given: Federico family: Errica - given: Davide family: Bacciu - given: Alessio family: Micheli editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2721-2737 id: castellana22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2721 lastpage: 2737 published: 2022-06-28 00:00:00 +0000 - title: 'Compressed-VFL: Communication-Efficient Learning with Vertically Partitioned Data' abstract: 'We propose Compressed Vertical Federated Learning (C-VFL) for communication-efficient training on vertically partitioned data. In C-VFL, a server and multiple parties collaboratively train a model on their respective features utilizing several local iterations and sharing compressed intermediate results periodically. Our work provides the first theoretical analysis of the effect message compression has on distributed training over vertically partitioned data. We prove convergence of non-convex objectives at a rate of $O(\frac{1}{\sqrt{T}})$ when the compression error is bounded over the course of training. We provide specific requirements for convergence with common compression techniques, such as quantization and top-$k$ sparsification. Finally, we experimentally show compression can reduce communication by over $90%$ without a significant decrease in accuracy over VFL without compression.' volume: 162 URL: https://proceedings.mlr.press/v162/castiglia22a.html PDF: https://proceedings.mlr.press/v162/castiglia22a/castiglia22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-castiglia22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Timothy J family: Castiglia - given: Anirban family: Das - given: Shiqiang family: Wang - given: Stacy family: Patterson editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2738-2766 id: castiglia22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2738 lastpage: 2766 published: 2022-06-28 00:00:00 +0000 - title: 'Online Learning with Knapsacks: the Best of Both Worlds' abstract: 'We study online learning problems in which a decision maker wants to maximize their expected reward without violating a finite set of $m$ resource constraints. By casting the learning process over a suitably defined space of strategy mixtures, we recover strong duality on a Lagrangian relaxation of the underlying optimization problem, even for general settings with non-convex reward and resource-consumption functions. Then, we provide the first best-of-both-worlds type framework for this setting, with no-regret guarantees both under stochastic and adversarial inputs. Our framework yields the same regret guarantees of prior work in the stochastic case. On the other hand, when budgets grow at least linearly in the time horizon, it allows us to provide a constant competitive ratio in the adversarial case, which improves over the $O(m \log T)$ competitive ratio of Immorlica et al. [FOCS’19]. Moreover, our framework allows the decision maker to handle non-convex reward and cost functions. We provide two game-theoretic applications of our framework to give further evidence of its flexibility.' volume: 162 URL: https://proceedings.mlr.press/v162/castiglioni22a.html PDF: https://proceedings.mlr.press/v162/castiglioni22a/castiglioni22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-castiglioni22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Matteo family: Castiglioni - given: Andrea family: Celli - given: Christian family: Kroer editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2767-2783 id: castiglioni22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2767 lastpage: 2783 published: 2022-06-28 00:00:00 +0000 - title: 'Stabilizing Off-Policy Deep Reinforcement Learning from Pixels' abstract: 'Off-policy reinforcement learning (RL) from pixel observations is notoriously unstable. As a result, many successful algorithms must combine different domain-specific practices and auxiliary losses to learn meaningful behaviors in complex environments. In this work, we provide novel analysis demonstrating that these instabilities arise from performing temporal-difference learning with a convolutional encoder and low-magnitude rewards. We show that this new visual deadly triad causes unstable training and premature convergence to degenerate solutions, a phenomenon we name catastrophic self-overfitting. Based on our analysis, we propose A-LIX, a method providing adaptive regularization to the encoder’s gradients that explicitly prevents the occurrence of catastrophic self-overfitting using a dual objective. By applying A-LIX, we significantly outperform the prior state-of-the-art on the DeepMind Control and Atari benchmarks without any data augmentation or auxiliary losses.' volume: 162 URL: https://proceedings.mlr.press/v162/cetin22a.html PDF: https://proceedings.mlr.press/v162/cetin22a/cetin22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-cetin22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Edoardo family: Cetin - given: Philip J family: Ball - given: Stephen family: Roberts - given: Oya family: Celiktutan editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2784-2810 id: cetin22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2784 lastpage: 2810 published: 2022-06-28 00:00:00 +0000 - title: 'Accelerated, Optimal and Parallel: Some results on model-based stochastic optimization' abstract: 'The Approximate-Proximal Point (APROX) family of model-based stochastic optimization algorithms improve over standard stochastic gradient methods, as they are robust to step size choices, adaptive to problem difficulty, converge on a broader range of problems than stochastic gradient methods, and converge very fast on interpolation problems, all while retaining nice minibatching properties \cite{AsiDu19siopt,AsiChChDu20}. In this paper, we propose an acceleration scheme for the APROX family and provide non-asymptotic convergence guarantees, which are order-optimal in all problem-dependent constants and provide even larger minibatching speedups. For interpolation problems where the objective satisfies additional growth conditions, we show that our algorithm achieves linear convergence rates for a wide range of stepsizes. In this setting, we also prove matching lower bounds, identifying new fundamental constants and showing the optimality of the APROX family. We corroborate our theoretical results with empirical testing to demonstrate the gains accurate modeling, acceleration, and minibatching provide.' volume: 162 URL: https://proceedings.mlr.press/v162/chadha22a.html PDF: https://proceedings.mlr.press/v162/chadha22a/chadha22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chadha22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Karan family: Chadha - given: Gary family: Cheng - given: John family: Duchi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2811-2827 id: chadha22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2811 lastpage: 2827 published: 2022-06-28 00:00:00 +0000 - title: 'Robust Imitation Learning against Variations in Environment Dynamics' abstract: 'In this paper, we propose a robust imitation learning (IL) framework that improves the robustness of IL when environment dynamics are perturbed. The existing IL framework trained in a single environment can catastrophically fail with perturbations in environment dynamics because it does not capture the situation that underlying environment dynamics can be changed. Our framework effectively deals with environments with varying dynamics by imitating multiple experts in sampled environment dynamics to enhance the robustness in general variations in environment dynamics. In order to robustly imitate the multiple sample experts, we minimize the risk with respect to the Jensen-Shannon divergence between the agent’s policy and each of the sample experts. Numerical results show that our algorithm significantly improves robustness against dynamics perturbations compared to conventional IL baselines.' volume: 162 URL: https://proceedings.mlr.press/v162/chae22a.html PDF: https://proceedings.mlr.press/v162/chae22a/chae22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chae22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jongseong family: Chae - given: Seungyul family: Han - given: Whiyoung family: Jung - given: Myungsik family: Cho - given: Sungho family: Choi - given: Youngchul family: Sung editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2828-2852 id: chae22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2828 lastpage: 2852 published: 2022-06-28 00:00:00 +0000 - title: 'Fairness with Adaptive Weights' abstract: 'Fairness is now an important issue in machine learning. There are arising concerns that automated decision-making systems reflect real-world biases. Although a wide range of fairness-related methods have been proposed in recent years, the under-representation problem has been less studied. Due to the uneven distribution of samples from different populations, machine learning models tend to be biased against minority groups when trained by minimizing the average empirical risk across all samples. In this paper, we propose a novel adaptive reweighing method to address representation bias. The goal of our method is to achieve group-level balance among different demographic groups by learning adaptive weights for each sample. Our approach emphasizes more on error-prone samples in prediction and enhances adequate representation of minority groups for fairness. We derive a closed-form solution for adaptive weight assignment and propose an efficient algorithm with theoretical convergence guarantees. We theoretically analyze the fairness of our model and empirically verify that our method strikes a balance between fairness and accuracy. In experiments, our method achieves comparable or better performance than state-of-the-art methods in both classification and regression tasks. Furthermore, our method exhibits robustness to label noise on various benchmark datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/chai22a.html PDF: https://proceedings.mlr.press/v162/chai22a/chai22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chai22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Junyi family: Chai - given: Xiaoqian family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2853-2866 id: chai22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2853 lastpage: 2866 published: 2022-06-28 00:00:00 +0000 - title: 'UNIREX: A Unified Learning Framework for Language Model Rationale Extraction' abstract: 'An extractive rationale explains a language model’s (LM’s) prediction on a given task instance by highlighting the text inputs that most influenced the prediction. Ideally, rationale extraction should be faithful (reflective of LM’s actual behavior) and plausible (convincing to humans), without compromising the LM’s (i.e., task model’s) task performance. Although attribution algorithms and select-predict pipelines are commonly used in rationale extraction, they both rely on certain heuristics that hinder them from satisfying all three desiderata. In light of this, we propose UNIREX, a flexible learning framework which generalizes rationale extractor optimization as follows: (1) specify architecture for a learned rationale extractor; (2) select explainability objectives (\ie faithfulness and plausibility criteria); and (3) jointly train the task model and rationale extractor on the task using selected objectives. UNIREX enables replacing prior works’ heuristic design choices with a generic learned rationale extractor in (1) and optimizing it for all three desiderata in (2)-(3). To facilitate comparison between methods w.r.t. multiple desiderata, we introduce the Normalized Relative Gain (NRG) metric. On five English text classification datasets, our best UNIREX configuration outperforms baselines by an average of 32.9% NRG. Plus, UNIREX rationale extractors’ faithfulness can even generalize to unseen datasets and tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/chan22a.html PDF: https://proceedings.mlr.press/v162/chan22a/chan22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chan22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Aaron family: Chan - given: Maziar family: Sanjabi - given: Lambert family: Mathias - given: Liang family: Tan - given: Shaoliang family: Nie - given: Xiaochang family: Peng - given: Xiang family: Ren - given: Hamed family: Firooz editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2867-2889 id: chan22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2867 lastpage: 2889 published: 2022-06-28 00:00:00 +0000 - title: 'Revisiting Label Smoothing and Knowledge Distillation Compatibility: What was Missing?' abstract: 'This work investigates the compatibility between label smoothing (LS) and knowledge distillation (KD). Contemporary findings addressing this thesis statement take dichotomous standpoints: Muller et al. (2019) and Shen et al. (2021b). Critically, there is no effort to understand and resolve these contradictory findings, leaving the primal question \text{-} to smooth or not to smooth a teacher network? \text{-} unanswered. The main contributions of our work are the discovery, analysis and validation of systematic diffusion as the missing concept which is instrumental in understanding and resolving these contradictory findings. This systematic diffusion essentially curtails the benefits of distilling from an LS-trained teacher, thereby rendering KD at increased temperatures ineffective. Our discovery is comprehensively supported by large-scale experiments, analyses and case studies including image classification, neural machine translation and compact student distillation tasks spanning across multiple datasets and teacher-student architectures. Based on our analysis, we suggest practitioners to use an LS-trained teacher with a low-temperature transfer to achieve high performance students. Code and models are available at https://keshik6.github.io/revisiting-ls-kd-compatibility/' volume: 162 URL: https://proceedings.mlr.press/v162/chandrasegaran22a.html PDF: https://proceedings.mlr.press/v162/chandrasegaran22a/chandrasegaran22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chandrasegaran22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Keshigeyan family: Chandrasegaran - given: Ngoc-Trung family: Tran - given: Yunqing family: Zhao - given: Ngai-Man family: Cheung editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2890-2916 id: chandrasegaran22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2890 lastpage: 2916 published: 2022-06-28 00:00:00 +0000 - title: 'Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models' abstract: 'Controllable generative sequence models with the capability to extract and replicate the style of specific examples enable many applications, including narrating audiobooks in different voices, auto-completing and auto-correcting written handwriting, and generating missing training samples for downstream recognition tasks. However, under an unsupervised-style setting, typical training algorithms for controllable sequence generative models suffer from the training-inference mismatch, where the same sample is used as content and style input during training but unpaired samples are given during inference. In this paper, we tackle the training-inference mismatch encountered during unsupervised learning of controllable generative sequence models. The proposed method is simple yet effective, where we use a style transformation module to transfer target style information into an unrelated style input. This method enables training using unpaired content and style samples and thereby mitigate the training-inference mismatch. We apply style equalization to text-to-speech and text-to-handwriting synthesis on three datasets. We conduct thorough evaluation, including both quantitative and qualitative user studies. Our results show that by mitigating the training-inference mismatch with the proposed style equalization, we achieve style replication scores comparable to real data in our user studies.' volume: 162 URL: https://proceedings.mlr.press/v162/chang22a.html PDF: https://proceedings.mlr.press/v162/chang22a/chang22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chang22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jen-Hao Rick family: Chang - given: Ashish family: Shrivastava - given: Hema family: Koppula - given: Xiaoshuai family: Zhang - given: Oncel family: Tuzel editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2917-2937 id: chang22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2917 lastpage: 2937 published: 2022-06-28 00:00:00 +0000 - title: 'Learning Bellman Complete Representations for Offline Policy Evaluation' abstract: 'We study representation learning for Offline Reinforcement Learning (RL), focusing on the important task of Offline Policy Evaluation (OPE). Recent work shows that, in contrast to supervised learning, realizability of the Q-function is not enough for learning it. Two sufficient conditions for sample-efficient OPE are Bellman completeness and coverage. Prior work often assumes that representations satisfying these conditions are given, with results being mostly theoretical in nature. In this work, we propose BCRL, which directly learns from data an approximately linear Bellman complete representation with good coverage. With this learned representation, we perform OPE using Least Square Policy Evaluation (LSPE) with linear functions in our learned representation. We present an end-to-end theoretical analysis, showing that our two-stage algorithm enjoys polynomial sample complexity provided some representation in the rich class considered is linear Bellman complete. Empirically, we extensively evaluate our algorithm on challenging, image-based continuous control tasks from the Deepmind Control Suite. We show our representation enables better OPE compared to previous representation learning methods developed for off-policy RL (e.g., CURL, SPR). BCRL achieve competitive OPE error with the state-of-the-art method Fitted Q-Evaluation (FQE), and beats FQE when evaluating beyond the initial state distribution. Our ablations show that both linear Bellman complete and coverage components of our method are crucial.' volume: 162 URL: https://proceedings.mlr.press/v162/chang22b.html PDF: https://proceedings.mlr.press/v162/chang22b/chang22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chang22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jonathan family: Chang - given: Kaiwen family: Wang - given: Nathan family: Kallus - given: Wen family: Sun editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2938-2971 id: chang22b issued: date-parts: - 2022 - 6 - 28 firstpage: 2938 lastpage: 2971 published: 2022-06-28 00:00:00 +0000 - title: 'Sample Efficient Learning of Predictors that Complement Humans' abstract: 'One of the goals of learning algorithms is to complement and reduce the burden on human decision makers. The expert deferral setting wherein an algorithm can either predict on its own or defer the decision to a downstream expert helps accomplish this goal. A fundamental aspect of this setting is the need to learn complementary predictors that improve on the human’s weaknesses rather than learning predictors optimized for average error. In this work, we provide the first theoretical analysis of the benefit of learning complementary predictors in expert deferral. To enable efficiently learning such predictors, we consider a family of consistent surrogate loss functions for expert deferral and analyze their theoretical properties. Finally, we design active learning schemes that require minimal amount of data of human expert predictions in order to learn accurate deferral systems.' volume: 162 URL: https://proceedings.mlr.press/v162/charusaie22a.html PDF: https://proceedings.mlr.press/v162/charusaie22a/charusaie22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-charusaie22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mohammad-Amin family: Charusaie - given: Hussein family: Mozannar - given: David family: Sontag - given: Samira family: Samadi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 2972-3005 id: charusaie22a issued: date-parts: - 2022 - 6 - 28 firstpage: 2972 lastpage: 3005 published: 2022-06-28 00:00:00 +0000 - title: 'Nyström Kernel Mean Embeddings' abstract: 'Kernel mean embeddings are a powerful tool to represent probability distributions over arbitrary spaces as single points in a Hilbert space. Yet, the cost of computing and storing such embeddings prohibits their direct use in large-scale settings. We propose an efficient approximation procedure based on the Nystr{ö}m method, which exploits a small random subset of the dataset. Our main result is an upper bound on the approximation error of this procedure. It yields sufficient conditions on the subsample size to obtain the standard (1/sqrt(n)) rate while reducing computational costs. We discuss applications of this result for the approximation of the maximum mean discrepancy and quadrature rules, and we illustrate our theoretical findings with numerical experiments.' volume: 162 URL: https://proceedings.mlr.press/v162/chatalic22a.html PDF: https://proceedings.mlr.press/v162/chatalic22a/chatalic22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chatalic22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Antoine family: Chatalic - given: Nicolas family: Schreuder - given: Lorenzo family: Rosasco - given: Alessandro family: Rudi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3006-3024 id: chatalic22a issued: date-parts: - 2022 - 6 - 28 firstpage: 3006 lastpage: 3024 published: 2022-06-28 00:00:00 +0000 - title: 'Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets' abstract: 'The lottery ticket hypothesis (LTH) has shown that dense models contain highly sparse subnetworks (i.e., winning tickets) that can be trained in isolation to match full accuracy. Despite many exciting efforts being made, there is one "commonsense" rarely challenged: a winning ticket is found by iterative magnitude pruning (IMP) and hence the resultant pruned subnetworks have only unstructured sparsity. That gap limits the appeal of winning tickets in practice, since the highly irregular sparse patterns are challenging to accelerate on hardware. Meanwhile, directly substituting structured pruning for unstructured pruning in IMP damages performance more severely and is usually unable to locate winning tickets. In this paper, we demonstrate the first positive result that a structurally sparse winning ticket can be effectively found in general. The core idea is to append "post-processing techniques" after each round of (unstructured) IMP, to enforce the formation of structural sparsity. Specifically, we first "re-fill" pruned elements back in some channels deemed to be important, and then "re-group" non-zero elements to create flexible group-wise structural patterns. Both our identified channel- and group-wise structural subnetworks win the lottery, with substantial inference speedups readily supported by existing hardware. Extensive experiments, conducted on diverse datasets across multiple network backbones, consistently validate our proposal, showing that the hardware acceleration roadblock of LTH is now removed. Specifically, the structural winning tickets obtain up to {64.93%, 64.84%, 60.23%} running time savings at {36% 80%, 74%, 58%} sparsity on {CIFAR, Tiny-ImageNet, ImageNet}, while maintaining comparable accuracy. Code is at https://github.com/VITA-Group/Structure-LTH.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22a.html PDF: https://proceedings.mlr.press/v162/chen22a/chen22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tianlong family: Chen - given: Xuxi family: Chen - given: Xiaolong family: Ma - given: Yanzhi family: Wang - given: Zhangyang family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3025-3039 id: chen22a issued: date-parts: - 2022 - 6 - 28 firstpage: 3025 lastpage: 3039 published: 2022-06-28 00:00:00 +0000 - title: 'Learning Domain Adaptive Object Detection with Probabilistic Teacher' abstract: 'Self-training for unsupervised domain adaptive object detection is a challenging task, of which the performance depends heavily on the quality of pseudo boxes. Despite the promising results, prior works have largely overlooked the uncertainty of pseudo boxes during self-training. In this paper, we present a simple yet effective framework, termed as Probabilistic Teacher (PT), which aims to capture the uncertainty of unlabeled target data from a gradually evolving teacher and guides the learning of a student in a mutually beneficial manner. Specifically, we propose to leverage the uncertainty-guided consistency training to promote classification adaptation and localization adaptation, rather than filtering pseudo boxes via an elaborate confidence threshold. In addition, we conduct anchor adaptation in parallel with localization adaptation, since anchor can be regarded as a learnable parameter. Together with this framework, we also present a novel Entropy Focal Loss (EFL) to further facilitate the uncertainty-guided self-training. Equipped with EFL, PT outperforms all previous baselines by a large margin and achieve new state-of-the-arts.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22b.html PDF: https://proceedings.mlr.press/v162/chen22b/chen22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Meilin family: Chen - given: Weijie family: Chen - given: Shicai family: Yang - given: Jie family: Song - given: Xinchao family: Wang - given: Lei family: Zhang - given: Yunfeng family: Yan - given: Donglian family: Qi - given: Yueting family: Zhuang - given: Di family: Xie - given: Shiliang family: Pu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3040-3055 id: chen22b issued: date-parts: - 2022 - 6 - 28 firstpage: 3040 lastpage: 3055 published: 2022-06-28 00:00:00 +0000 - title: 'The Fundamental Price of Secure Aggregation in Differentially Private Federated Learning' abstract: 'We consider the problem of training a $d$ dimensional model with distributed differential privacy (DP) where secure aggregation (SecAgg) is used to ensure that the server only sees the noisy sum of $n$ model updates in every training round. Taking into account the constraints imposed by SecAgg, we characterize the fundamental communication cost required to obtain the best accuracy achievable under $\varepsilon$ central DP (i.e. under a fully trusted server and no communication constraints). Our results show that $\tilde{O}\lp \min(n^2\varepsilon^2, d) \rp$ bits per client are both sufficient and necessary, and this fundamental limit can be achieved by a linear scheme based on sparse random projections. This provides a significant improvement relative to state-of-the-art SecAgg distributed DP schemes which use $\tilde{O}(d\log(d/\varepsilon^2))$ bits per client. Empirically, we evaluate our proposed scheme on real-world federated learning tasks. We find that our theoretical analysis is well matched in practice. In particular, we show that we can reduce the communication cost to under $1.78$ bits per parameter in realistic privacy settings without decreasing test-time performance. Our work hence theoretically and empirically specifies the fundamental price of using SecAgg.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22c.html PDF: https://proceedings.mlr.press/v162/chen22c/chen22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Wei-Ning family: Chen - given: Christopher A Choquette family: Choo - given: Peter family: Kairouz - given: Ananda Theertha family: Suresh editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3056-3089 id: chen22c issued: date-parts: - 2022 - 6 - 28 firstpage: 3056 lastpage: 3089 published: 2022-06-28 00:00:00 +0000 - title: 'Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning' abstract: 'An ideal learned representation should display transferability and robustness. Supervised contrastive learning (SupCon) is a promising method for training accurate models, but produces representations that do not capture these properties due to class collapse—when all points in a class map to the same representation. Recent work suggests that "spreading out" these representations improves them, but the precise mechanism is poorly understood. We argue that creating spread alone is insufficient for better representations, since spread is invariant to permutations within classes. Instead, both the correct degree of spread and a mechanism for breaking this invariance are necessary. We first prove that adding a weighted class-conditional InfoNCE loss to SupCon controls the degree of spread. Next, we study three mechanisms to break permutation invariance: using a constrained encoder, adding a class-conditional autoencoder, and using data augmentation. We show that the latter two encourage clustering of latent subclasses under more realistic conditions than the former. Using these insights, we show that adding a properly-weighted class-conditional InfoNCE loss and a class-conditional autoencoder to SupCon achieves 11.1 points of lift on coarse-to-fine transfer across 5 standard datasets and 4.7 points on worst-group robustness on 3 datasets, setting state-of-the-art on CelebA by 11.5 points.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22d.html PDF: https://proceedings.mlr.press/v162/chen22d/chen22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mayee family: Chen - given: Daniel Y family: Fu - given: Avanika family: Narayan - given: Michael family: Zhang - given: Zhao family: Song - given: Kayvon family: Fatahalian - given: Christopher family: Re editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3090-3122 id: chen22d issued: date-parts: - 2022 - 6 - 28 firstpage: 3090 lastpage: 3122 published: 2022-06-28 00:00:00 +0000 - title: 'Strategies for Safe Multi-Armed Bandits with Logarithmic Regret and Risk' abstract: 'We investigate a natural but surprisingly unstudied approach to the multi-armed bandit problem under safety risk constraints. Each arm is associated with an unknown law on safety risks and rewards, and the learner’s goal is to maximise reward whilst not playing unsafe arms, as determined by a given threshold on the mean risk. We formulate a pseudo-regret for this setting that enforces this safety constraint in a per-round way by softly penalising any violation, regardless of the gain in reward due to the same. This has practical relevance to scenarios such as clinical trials, where one must maintain safety for each round rather than in an aggregated sense. We describe doubly optimistic strategies for this scenario, which maintain optimistic indices for both safety risk and reward. We show that schema based on both frequentist and Bayesian indices satisfy tight gap-dependent logarithmic regret bounds, and further that these play unsafe arms only logarithmically many times in total. This theoretical analysis is complemented by simulation studies demonstrating the effectiveness of the proposed schema, and probing the domains in which their use is appropriate.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22e.html PDF: https://proceedings.mlr.press/v162/chen22e/chen22e.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tianrui family: Chen - given: Aditya family: Gangrade - given: Venkatesh family: Saligrama editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3123-3148 id: chen22e issued: date-parts: - 2022 - 6 - 28 firstpage: 3123 lastpage: 3148 published: 2022-06-28 00:00:00 +0000 - title: 'On the Sample Complexity of Learning Infinite-horizon Discounted Linear Kernel MDPs' abstract: 'We study reinforcement learning for infinite-horizon discounted linear kernel MDPs, where the transition probability function is linear in a predefined feature mapping. Existing UCLK \citep{zhou2020provably} algorithm for this setting only has a regret guarantee, which cannot lead to a tight sample complexity bound. In this paper, we extend the uniform-PAC sample complexity from episodic setting to the infinite-horizon discounted setting, and propose a novel algorithm dubbed UPAC-UCLK that achieves an $\Tilde{O}\big(d^2/((1-\gamma)^4\epsilon^2)+1/((1-\gamma)^6\epsilon^2)\big)$ uniform-PAC sample complexity, where $d$ is the dimension of the feature mapping, $\gamma \in(0,1)$ is the discount factor of the MDP and $\epsilon$ is the accuracy parameter. To the best of our knowledge, this is the first $\tilde{O}(1/\epsilon^2)$ sample complexity bound for learning infinite-horizon discounted MDPs with linear function approximation (without access to the generative model).' volume: 162 URL: https://proceedings.mlr.press/v162/chen22f.html PDF: https://proceedings.mlr.press/v162/chen22f/chen22f.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22f.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yuanzhou family: Chen - given: Jiafan family: He - given: Quanquan family: Gu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3149-3183 id: chen22f issued: date-parts: - 2022 - 6 - 28 firstpage: 3149 lastpage: 3183 published: 2022-06-28 00:00:00 +0000 - title: 'Streaming Algorithms for Support-Aware Histograms' abstract: 'Histograms, i.e., piece-wise constant approximations, are a popular tool used to represent data distributions. Traditionally, the difference between the histogram and the underlying distribution (i.e., the approximation error) is measured using the L_p norm, which sums the differences between the two functions over all items in the domain. Although useful in many applications, the drawback of this error measure is that it treats approximation errors of all items in the same way, irrespective of whether the mass of an item is important for the downstream application that uses the approximation. As a result, even relatively simple distributions cannot be approximated by succinct histograms without incurring large error. In this paper, we address this issue by adapting the definition of approximation so that only the errors of the items that belong to the support of the distribution are considered. Under this definition, we develop efficient 1-pass and 2-pass streaming algorithms that compute near-optimal histograms in sub-linear space. We also present lower bounds on the space complexity of this problem. Surprisingly, under this notion of error, there is an exponential gap in the space complexity of 1-pass and 2-pass streaming algorithms. Finally, we demonstrate the utility of our algorithms on a collection of real and synthetic data sets.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22g.html PDF: https://proceedings.mlr.press/v162/chen22g/chen22g.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22g.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Justin family: Chen - given: Piotr family: Indyk - given: Tal family: Wagner editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3184-3203 id: chen22g issued: date-parts: - 2022 - 6 - 28 firstpage: 3184 lastpage: 3203 published: 2022-06-28 00:00:00 +0000 - title: 'Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP' abstract: 'We introduce two new no-regret algorithms for the stochastic shortest path (SSP) problem with a linear MDP that significantly improve over the only existing results of (Vial et al., 2021). Our first algorithm is computationally efficient and achieves a regret bound $O(\sqrt{d^3B_{\star}^2T_{\star} K})$, where $d$ is the dimension of the feature space, $B_{\star}$ and $T_{\star}$ are upper bounds of the expected costs and hitting time of the optimal policy respectively, and $K$ is the number of episodes. The same algorithm with a slight modification also achieves logarithmic regret of order $O(\frac{d^3B_{\star}^4}{c_{\min}^2\text{\rm gap}_{\min} }\ln^5\frac{dB_{\star} K}{c_{\min}})$, where $\text{\rm gap}_{\min}$ is the minimum sub-optimality gap and $c_{\min}$ is the minimum cost over all state-action pairs. Our result is obtained by developing a simpler and improved analysis for the finite-horizon approximation of (Cohen et al., 2021) with a smaller approximation error, which might be of independent interest. On the other hand, using variance-aware confidence sets in a global optimization problem, our second algorithm is computationally inefficient but achieves the first “horizon-free” regret bound $O(d^{3.5}B_{\star}\sqrt{K})$ with no polynomial dependency on $T_{\star}$ or $1/c_{\min}$, almost matching the $\Omega(dB_{\star}\sqrt{K})$ lower bound from (Min et al., 2021).' volume: 162 URL: https://proceedings.mlr.press/v162/chen22h.html PDF: https://proceedings.mlr.press/v162/chen22h/chen22h.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22h.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Liyu family: Chen - given: Rahul family: Jain - given: Haipeng family: Luo editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3204-3245 id: chen22h issued: date-parts: - 2022 - 6 - 28 firstpage: 3204 lastpage: 3245 published: 2022-06-28 00:00:00 +0000 - title: 'Learning Infinite-horizon Average-reward Markov Decision Process with Constraints' abstract: 'We study regret minimization for infinite-horizon average-reward Markov Decision Processes (MDPs) under cost constraints. We start by designing a policy optimization algorithm with carefully designed action-value estimator and bonus term, and show that for ergodic MDPs, our algorithm ensures $O(\sqrt{T})$ regret and constant constraint violation, where $T$ is the total number of time steps. This strictly improves over the algorithm of (Singh et al., 2020), whose regret and constraint violation are both $O(T^{2/3})$. Next, we consider the most general class of weakly communicating MDPs. Through a finite-horizon approximation, we develop another algorithm with $O(T^{2/3})$ regret and constraint violation, which can be further improved to $O(\sqrt{T})$ via a simple modification, albeit making the algorithm computationally inefficient. As far as we know, these are the first set of provable algorithms for weakly communicating MDPs with cost constraints.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22i.html PDF: https://proceedings.mlr.press/v162/chen22i/chen22i.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22i.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Liyu family: Chen - given: Rahul family: Jain - given: Haipeng family: Luo editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3246-3270 id: chen22i issued: date-parts: - 2022 - 6 - 28 firstpage: 3246 lastpage: 3270 published: 2022-06-28 00:00:00 +0000 - title: 'Active Multi-Task Representation Learning' abstract: 'To leverage the power of big data from source domains and overcome the scarcity of target domain samples, representation learning based on multi-task pretraining has become a standard approach in many applications. However, large-scale pretraining is often computationally expensive and not affordable for small organizations. When there is only one target task, most source tasks can be irrelevant, and we can actively sample a subset of source data from the most To leverage the power of big data from source tasks and overcome the scarcity of the target task samples, representation learning based on multi-task pretraining has become a standard approach in many applications. However, up until now, choosing which source tasks to include in the multi-task learning has been more art than science. In this paper, we give the first formal study on resource task sampling by leveraging the techniques from active learning. We propose an algorithm that iteratively estimates the relevance of each source task to the target task and samples from each source task based on the estimated relevance. Theoretically, we show that for the linear representation class, to achieve the same error rate, our algorithm can save up to a textit{number of source tasks} factor in the source task sample complexity, compared with the naive uniform sampling from all source tasks. We also provide experiments on real-world computer vision datasets to illustrate the effectiveness of our proposed method on both linear and convolutional neural network representation classes. We believe our paper serves as an important initial step to bring techniques from active learning to representation learning.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22j.html PDF: https://proceedings.mlr.press/v162/chen22j/chen22j.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22j.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yifang family: Chen - given: Kevin family: Jamieson - given: Simon family: Du editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3271-3298 id: chen22j issued: date-parts: - 2022 - 6 - 28 firstpage: 3271 lastpage: 3298 published: 2022-06-28 00:00:00 +0000 - title: 'On Collective Robustness of Bagging Against Data Poisoning' abstract: 'Bootstrap aggregating (bagging) is an effective ensemble protocol, which is believed can enhance robustness by its majority voting mechanism. Recent works further prove the sample-wise robustness certificates for certain forms of bagging (e.g. partition aggregation). Beyond these particular forms, in this paper, we propose the first collective certification for general bagging to compute the tight robustness against the global poisoning attack. Specifically, we compute the maximum number of simultaneously changed predictions via solving a binary integer linear programming (BILP) problem. Then we analyze the robustness of vanilla bagging and give the upper bound of the tolerable poison budget. Based on this analysis, we propose hash bagging to improve the robustness of vanilla bagging almost for free. This is achieved by modifying the random subsampling in vanilla bagging to a hash-based deterministic subsampling, as a way of controlling the influence scope for each poisoning sample universally. Our extensive experiments show the notable advantage in terms of applicability and robustness. Our code is available at https://github.com/Emiyalzn/ICML22-CRB.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22k.html PDF: https://proceedings.mlr.press/v162/chen22k/chen22k.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22k.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ruoxin family: Chen - given: Zenan family: Li - given: Jie family: Li - given: Junchi family: Yan - given: Chentao family: Wu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3299-3319 id: chen22k issued: date-parts: - 2022 - 6 - 28 firstpage: 3299 lastpage: 3319 published: 2022-06-28 00:00:00 +0000 - title: 'Online Active Regression' abstract: 'Active regression considers a linear regression problem where the learner receives a large number of data points but can only observe a small number of labels. Since online algorithms can deal with incremental training data and take advantage of low computational cost, we consider an online extension of the active regression problem: the learner receives data points one by one and immediately decides whether it should collect the corresponding labels. The goal is to efficiently maintain the regression of received data points with a small budget of label queries. We propose novel algorithms for this problem under $\ell_p$ loss where $p\in[1,2]$. To achieve a $(1+\epsilon)$-approximate solution, our proposed algorithms only requires $\tilde{\mathcal{O}}(d/poly(\epsilon))$ queries of labels. The numerical results verify our theoretical results and show that our methods have comparable performance with offline active regression algorithms.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22l.html PDF: https://proceedings.mlr.press/v162/chen22l/chen22l.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22l.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Cheng family: Chen - given: Yi family: Li - given: Yiming family: Sun editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3320-3335 id: chen22l issued: date-parts: - 2022 - 6 - 28 firstpage: 3320 lastpage: 3335 published: 2022-06-28 00:00:00 +0000 - title: 'Selling Data To a Machine Learner: Pricing via Costly Signaling' abstract: 'We consider a new problem of selling data to a machine learner who looks to purchase data to train his machine learning model. A key challenge in this setup is that neither the seller nor the machine learner knows the true quality of data. When designing a revenue-maximizing mechanism, a data seller faces the tradeoff between the cost and precision of data quality estimation. To address this challenge, we study a natural class of mechanisms that price data via costly signaling. Motivated by the assumption of i.i.d. data points as in classic machine learning models, we first consider selling homogeneous data and derive an optimal selling mechanism. We then turn to the sale of heterogeneous data, motivated by the sale of multiple data sets, and show that 1) on the negative side, it is NP-hard to approximate the optimal mechanism within a constant ratio e/(e+1) + o(1); while 2) on the positive side, there is a 1/k-approximate algorithm, where k is the number of the machine learner’s private types.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22m.html PDF: https://proceedings.mlr.press/v162/chen22m/chen22m.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22m.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Junjie family: Chen - given: Minming family: Li - given: Haifeng family: Xu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3336-3359 id: chen22m issued: date-parts: - 2022 - 6 - 28 firstpage: 3336 lastpage: 3359 published: 2022-06-28 00:00:00 +0000 - title: 'ME-GAN: Learning Panoptic Electrocardio Representations for Multi-view ECG Synthesis Conditioned on Heart Diseases' abstract: 'Electrocardiogram (ECG) is a widely used non-invasive diagnostic tool for heart diseases. Many studies have devised ECG analysis models (e.g., classifiers) to assist diagnosis. As an upstream task, researches have built generative models to synthesize ECG data, which are beneficial to providing training samples, privacy protection, and annotation reduction. However, previous generative methods for ECG often neither synthesized multi-view data, nor dealt with heart disease conditions. In this paper, we propose a novel disease-aware generative adversarial network for multi-view ECG synthesis called ME-GAN, which attains panoptic electrocardio representations conditioned on heart diseases and projects the representations onto multiple standard views to yield ECG signals. Since ECG manifestations of heart diseases are often localized in specific waveforms, we propose a new "mixup normalization" to inject disease information precisely into suitable locations. In addition, we propose a "view discriminator" to revert disordered ECG views into a pre-determined order, supervising the generator to obtain ECG representing correct view characteristics. Besides, a new metric, rFID, is presented to assess the quality of the synthesized ECG signals. Comprehensive experiments verify that our ME-GAN performs well on multi-view ECG signal synthesis with trusty morbid manifestations.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22n.html PDF: https://proceedings.mlr.press/v162/chen22n/chen22n.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22n.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jintai family: Chen - given: Kuanlun family: Liao - given: Kun family: Wei - given: Haochao family: Ying - given: Danny Z family: Chen - given: Jian family: Wu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3360-3370 id: chen22n issued: date-parts: - 2022 - 6 - 28 firstpage: 3360 lastpage: 3370 published: 2022-06-28 00:00:00 +0000 - title: 'Weisfeiler-Lehman Meets Gromov-Wasserstein' abstract: 'The Weisfeiler-Lehman (WL) test is a classical procedure for graph isomorphism testing. The WL test has also been widely used both for designing graph kernels and for analyzing graph neural networks. In this paper, we propose the Weisfeiler-Lehman (WL) distance, a notion of distance between labeled measure Markov chains (LMMCs), of which labeled graphs are special cases. The WL distance is polynomial time computable and is also compatible with the WL test in the sense that the former is positive if and only if the WL test can distinguish the two involved graphs. The WL distance captures and compares subtle structures of the underlying LMMCs and, as a consequence of this, it is more discriminating than the distance between graphs used for defining the state-of-the-art Wasserstein Weisfeiler-Lehman graph kernel. Inspired by the structure of the WL distance we identify a neural network architecture on LMMCs which turns out to be universal w.r.t. continuous functions defined on the space of all LMMCs (which includes all graphs) endowed with the WL distance. Finally, the WL distance turns out to be stable w.r.t. a natural variant of the Gromov-Wasserstein (GW) distance for comparing metric Markov chains that we identify. Hence, the WL distance can also be construed as a polynomial time lower bound for the GW distance which is in general NP-hard to compute.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22o.html PDF: https://proceedings.mlr.press/v162/chen22o/chen22o.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22o.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Samantha family: Chen - given: Sunhyuk family: Lim - given: Facundo family: Memoli - given: Zhengchao family: Wan - given: Yusu family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3371-3416 id: chen22o issued: date-parts: - 2022 - 6 - 28 firstpage: 3371 lastpage: 3416 published: 2022-06-28 00:00:00 +0000 - title: 'On Non-local Convergence Analysis of Deep Linear Networks' abstract: 'In this paper, we study the non-local convergence properties of deep linear networks. Specifically, under the quadratic loss, we consider optimizing deep linear networks in which there is at least a layer with only one neuron. We describe the convergent point of trajectories with an arbitrary balanced starting point under gradient flow, including the paths which converge to one of the saddle points. We also show specific convergence rates of trajectories that converge to the global minimizers by stages. We conclude that the rates vary from polynomial to linear. As far as we know, our results are the first to give a non-local analysis of deep linear neural networks with arbitrary balanced initialization, rather than the lazy training regime which has dominated the literature on neural networks or the restricted benign initialization.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22p.html PDF: https://proceedings.mlr.press/v162/chen22p/chen22p.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22p.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Kun family: Chen - given: Dachao family: Lin - given: Zhihua family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3417-3443 id: chen22p issued: date-parts: - 2022 - 6 - 28 firstpage: 3417 lastpage: 3443 published: 2022-06-28 00:00:00 +0000 - title: 'Flow-based Recurrent Belief State Learning for POMDPs' abstract: 'Partially Observable Markov Decision Process (POMDP) provides a principled and generic framework to model real world sequential decision making processes but yet remains unsolved, especially for high dimensional continuous space and unknown models. The main challenge lies in how to accurately obtain the belief state, which is the probability distribution over the unobservable environment states given historical information. Accurately calculating this belief state is a precondition for obtaining an optimal policy of POMDPs. Recent advances in deep learning techniques show great potential to learn good belief states. However, existing methods can only learn approximated distribution with limited flexibility. In this paper, we introduce the \textbf{F}l\textbf{O}w-based \textbf{R}ecurrent \textbf{BE}lief \textbf{S}tate model (FORBES), which incorporates normalizing flows into the variational inference to learn general continuous belief states for POMDPs. Furthermore, we show that the learned belief states can be plugged into downstream RL algorithms to improve performance. In experiments, we show that our methods successfully capture the complex belief states that enable multi-modal predictions as well as high quality reconstructions, and results on challenging visual-motor control tasks show that our method achieves superior performance and sample efficiency.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22q.html PDF: https://proceedings.mlr.press/v162/chen22q/chen22q.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22q.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xiaoyu family: Chen - given: Yao Mark family: Mu - given: Ping family: Luo - given: Shengbo family: Li - given: Jianyu family: Chen editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3444-3468 id: chen22q issued: date-parts: - 2022 - 6 - 28 firstpage: 3444 lastpage: 3468 published: 2022-06-28 00:00:00 +0000 - title: 'Structure-Aware Transformer for Graph Representation Learning' abstract: 'The Transformer architecture has gained growing attention in graph representation learning recently, as it naturally overcomes several limitations of graph neural networks (GNNs) by avoiding their strict structural inductive biases and instead only encoding the graph structure via positional encoding. Here, we show that the node representations generated by the Transformer with positional encoding do not necessarily capture structural similarity between them. To address this issue, we propose the Structure-Aware Transformer, a class of simple and flexible graph Transformers built upon a new self-attention mechanism. This new self-attention incorporates structural information into the original self-attention by extracting a subgraph representation rooted at each node before computing the attention. We propose several methods for automatically generating the subgraph representation and show theoretically that the resulting representations are at least as expressive as the subgraph representations. Empirically, our method achieves state-of-the-art performance on five graph prediction benchmarks. Our structure-aware framework can leverage any existing GNN to extract the subgraph representation, and we show that it systematically improves performance relative to the base GNN model, successfully combining the advantages of GNNs and Transformers. Our code is available at https://github.com/BorgwardtLab/SAT.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22r.html PDF: https://proceedings.mlr.press/v162/chen22r/chen22r.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22r.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Dexiong family: Chen - given: Leslie family: O’Bray - given: Karsten family: Borgwardt editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3469-3489 id: chen22r issued: date-parts: - 2022 - 6 - 28 firstpage: 3469 lastpage: 3489 published: 2022-06-28 00:00:00 +0000 - title: 'The Poisson Binomial Mechanism for Unbiased Federated Learning with Secure Aggregation' abstract: 'We introduce the Poisson Binomial mechanism (PBM), a discrete differential privacy mechanism for distributed mean estimation (DME) with applications to federated learning and analytics. We provide a tight analysis of its privacy guarantees, showing that it achieves the same privacy-accuracy trade-offs as the continuous Gaussian mechanism. Our analysis is based on a novel bound on the Rényi divergence of two Poisson binomial distributions that may be of independent interest. Unlike previous discrete DP schemes based on additive noise, our mechanism encodes local information into a parameter of the binomial distribution, and hence the output distribution is discrete with bounded support. Moreover, the support does not increase as the privacy budget goes to zero as in the case of additive schemes which require the addition of more noise to achieve higher privacy; on the contrary, the support becomes smaller as eps goes to zero. The bounded support enables us to combine our mechanism with secure aggregation (SecAgg), a multi-party cryptographic protocol, without the need of performing modular clipping which results in an unbiased estimator of the sum of the local vectors. This in turn allows us to apply it in the private FL setting and provide an upper bound on the convergence rate of the SGD algorithm. Moreover, since the support of the output distribution becomes smaller as $\varepsilon \ra 0$, the communication cost of our scheme decreases with the privacy constraint $\varepsilon$, outperforming all previous distributed DP schemes based on additive noise in the high privacy or low communication regimes.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22s.html PDF: https://proceedings.mlr.press/v162/chen22s/chen22s.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22s.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Wei-Ning family: Chen - given: Ayfer family: Ozgur - given: Peter family: Kairouz editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3490-3506 id: chen22s issued: date-parts: - 2022 - 6 - 28 firstpage: 3490 lastpage: 3506 published: 2022-06-28 00:00:00 +0000 - title: 'Learning Mixtures of Linear Dynamical Systems' abstract: 'We study the problem of learning a mixture of multiple linear dynamical systems (LDSs) from unlabeled short sample trajectories, each generated by one of the LDS models. Despite the wide applicability of mixture models for time-series data, learning algorithms that come with end-to-end performance guarantees are largely absent from existing literature. There are multiple sources of technical challenges, including but not limited to (1) the presence of latent variables (i.e. the unknown labels of trajectories); (2) the possibility that the sample trajectories might have lengths much smaller than the dimension $d$ of the LDS models; and (3) the complicated temporal dependence inherent to time-series data. To tackle these challenges, we develop a two-stage meta-algorithm, which is guaranteed to efficiently recover each ground-truth LDS model up to error $\tilde{O}(\sqrt{d/T})$, where $T$ is the total sample size. We validate our theoretical studies with numerical experiments, confirming the efficacy of the proposed algorithm.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22t.html PDF: https://proceedings.mlr.press/v162/chen22t/chen22t.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22t.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yanxi family: Chen - given: H. Vincent family: Poor editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3507-3557 id: chen22t issued: date-parts: - 2022 - 6 - 28 firstpage: 3507 lastpage: 3557 published: 2022-06-28 00:00:00 +0000 - title: 'On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation' abstract: 'We study the off-policy evaluation (OPE) problem in an infinite-horizon Markov decision process with continuous states and actions. We recast the $Q$-function estimation into a special form of the nonparametric instrumental variables (NPIV) estimation problem. We first show that under one mild condition the NPIV formulation of $Q$-function estimation is well-posed in the sense of $L^2$-measure of ill-posedness with respect to the data generating distribution, bypassing a strong assumption on the discount factor $\gamma$ imposed in the recent literature for obtaining the $L^2$ convergence rates of various $Q$-function estimators. Thanks to this new well-posed property, we derive the first minimax lower bounds for the convergence rates of nonparametric estimation of $Q$-function and its derivatives in both sup-norm and $L^2$-norm, which are shown to be the same as those for the classical nonparametric regression (Stone, 1982). We then propose a sieve two-stage least squares estimator and establish its rate-optimality in both norms under some mild conditions. Our general results on the well-posedness and the minimax lower bounds are of independent interest to study not only other nonparametric estimators for $Q$-function but also efficient estimation on the value of any target policy in off-policy settings.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22u.html PDF: https://proceedings.mlr.press/v162/chen22u/chen22u.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22u.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xiaohong family: Chen - given: Zhengling family: Qi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3558-3582 id: chen22u issued: date-parts: - 2022 - 6 - 28 firstpage: 3558 lastpage: 3582 published: 2022-06-28 00:00:00 +0000 - title: 'Faster Fundamental Graph Algorithms via Learned Predictions' abstract: 'We consider the question of speeding up classic graph algorithms with machine-learned predictions. In this model, algorithms are furnished with extra advice learned from past or similar instances. Given the additional information, we aim to improve upon the traditional worst-case run-time guarantees. Our contributions are the following: (i) We give a faster algorithm for minimum-weight bipartite matching via learned duals, improving the recent result by Dinitz, Im, Lavastida, Moseley and Vassilvitskii (NeurIPS, 2021); (ii) We extend the learned dual approach to the single-source shortest path problem (with negative edge lengths), achieving an almost linear runtime given sufficiently accurate predictions which improves upon the classic fastest algorithm due to Goldberg (SIAM J. Comput., 1995); (iii) We provide a general reduction-based framework for learning-based graph algorithms, leading to new algorithms for degree-constrained subgraph and minimum-cost 0-1 flow, based on reductions to bipartite matching and the shortest path problem. Finally, we give a set of general learnability theorems, showing that the predictions required by our algorithms can be efficiently learned in a PAC fashion.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22v.html PDF: https://proceedings.mlr.press/v162/chen22v/chen22v.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22v.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Justin family: Chen - given: Sandeep family: Silwal - given: Ali family: Vakilian - given: Fred family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3583-3602 id: chen22v issued: date-parts: - 2022 - 6 - 28 firstpage: 3583 lastpage: 3602 published: 2022-06-28 00:00:00 +0000 - title: 'Improve Single-Point Zeroth-Order Optimization Using High-Pass and Low-Pass Filters' abstract: 'Single-point zeroth-order optimization (SZO) is useful in solving online black-box optimization and control problems in time-varying environments, as it queries the function value only once at each time step. However, the vanilla SZO method is known to suffer from a large estimation variance and slow convergence, which seriously limits its practical application. In this work, we borrow the idea of high-pass and low-pass filters from extremum seeking control (continuous-time version of SZO) and develop a novel SZO method called HLF-SZO by integrating these filters. It turns out that the high-pass filter coincides with the residual feedback method, and the low-pass filter can be interpreted as the momentum method. As a result, the proposed HLF-SZO achieves a much smaller variance and much faster convergence than the vanilla SZO method, and empirically outperforms the residual-feedback SZO method, which are verified via extensive numerical experiments.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22w.html PDF: https://proceedings.mlr.press/v162/chen22w/chen22w.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22w.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xin family: Chen - given: Yujie family: Tang - given: Na family: Li editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3603-3620 id: chen22w issued: date-parts: - 2022 - 6 - 28 firstpage: 3603 lastpage: 3620 published: 2022-06-28 00:00:00 +0000 - title: 'Deep Variational Graph Convolutional Recurrent Network for Multivariate Time Series Anomaly Detection' abstract: 'Anomaly detection within multivariate time series (MTS) is an essential task in both data mining and service quality management. Many recent works on anomaly detection focus on designing unsupervised probabilistic models to extract robust normal patterns of MTS. In this paper, we model sensor dependency and stochasticity within MTS by developing an embedding-guided probabilistic generative network. We combine it with adaptive variational graph convolutional recurrent network %and get variational GCRN (VGCRN) to model both spatial and temporal fine-grained correlations in MTS. To explore hierarchical latent representations, we further extend VGCRN into a deep variational network, which captures multilevel information at different layers and is robust to noisy time series. Moreover, we develop an upward-downward variational inference scheme that considers both forecasting-based and reconstruction-based losses, achieving an accurate posterior approximation of latent variables with better MTS representations. The experiments verify the superiority of the proposed method over current state-of-the-art methods.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22x.html PDF: https://proceedings.mlr.press/v162/chen22x/chen22x.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22x.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Wenchao family: Chen - given: Long family: Tian - given: Bo family: Chen - given: Liang family: Dai - given: Zhibin family: Duan - given: Mingyuan family: Zhou editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3621-3633 id: chen22x issued: date-parts: - 2022 - 6 - 28 firstpage: 3621 lastpage: 3633 published: 2022-06-28 00:00:00 +0000 - title: 'Auxiliary Learning with Joint Task and Data Scheduling' abstract: 'Existing auxiliary learning approaches only consider the relationships between the target task and the auxiliary tasks, ignoring the fact that data samples within an auxiliary task could contribute differently to the target task, which results in inefficient auxiliary information usage and non-robustness to data noise. In this paper, we propose to learn a joint task and data schedule for auxiliary learning, which captures the importance of different data samples in each auxiliary task to the target task. However, learning such a joint schedule is challenging due to the large number of additional parameters required for the schedule. To tackle the challenge, we propose a joint task and data scheduling (JTDS) model for auxiliary learning. The JTDS model captures the joint task-data importance through a task-data scheduler, which creates a mapping from task, feature and label information to the schedule in a parameter-efficient way. Particularly, we formulate the scheduler and the task learning process as a bi-level optimization problem. In the lower optimization, the task learning model is updated with the scheduled gradient, while in the upper optimization, the task-data scheduler is updated with the implicit gradient. Experimental results show that our JTDS model significantly outperforms the state-of-the-art methods under supervised, semi-supervised and corrupted label settings.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22y.html PDF: https://proceedings.mlr.press/v162/chen22y/chen22y.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22y.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hong family: Chen - given: Xin family: Wang - given: Chaoyu family: Guan - given: Yue family: Liu - given: Wenwu family: Zhu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3634-3647 id: chen22y issued: date-parts: - 2022 - 6 - 28 firstpage: 3634 lastpage: 3647 published: 2022-06-28 00:00:00 +0000 - title: 'Optimization-Induced Graph Implicit Nonlinear Diffusion' abstract: 'Due to the over-smoothing issue, most existing graph neural networks can only capture limited dependencies with their inherently finite aggregation layers. To overcome this limitation, we propose a new kind of graph convolution, called Graph Implicit Nonlinear Diffusion (GIND), which implicitly has access to infinite hops of neighbors while adaptively aggregating features with nonlinear diffusion to prevent over-smoothing. Notably, we show that the learned representation can be formalized as the minimizer of an explicit convex optimization objective. With this property, we can theoretically characterize the equilibrium of our GIND from an optimization perspective. More interestingly, we can induce new structural variants by modifying the corresponding optimization objective. To be specific, we can embed prior properties to the equilibrium, as well as introducing skip connections to promote training stability. Extensive experiments show that GIND is good at capturing long-range dependencies, and performs well on both homophilic and heterophilic graphs with nonlinear diffusion. Moreover, we show that the optimization-induced variants of our models can boost the performance and improve training stability and efficiency as well. As a result, our GIND obtains significant improvements on both node-level and graph-level tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22z.html PDF: https://proceedings.mlr.press/v162/chen22z/chen22z.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22z.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Qi family: Chen - given: Yifei family: Wang - given: Yisen family: Wang - given: Jiansheng family: Yang - given: Zhouchen family: Lin editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3648-3661 id: chen22z issued: date-parts: - 2022 - 6 - 28 firstpage: 3648 lastpage: 3661 published: 2022-06-28 00:00:00 +0000 - title: 'Robust Meta-learning with Sampling Noise and Label Noise via Eigen-Reptile' abstract: 'Recent years have seen a surge of interest in meta-learning techniques for tackling the few-shot learning (FSL) problem. However, the meta-learner is prone to overfitting since there are only a few available samples, which can be identified as sampling noise on a clean dataset. Besides, when handling the data with noisy labels, the meta-learner could be extremely sensitive to label noise on a corrupted dataset. To address these two challenges, we present Eigen-Reptile (ER) that updates the meta-parameters with the main direction of historical task-specific parameters. Specifically, the main direction is computed in a fast way, where the scale of the calculated matrix is related to the number of gradient steps for the specific task instead of the number of parameters. Furthermore, to obtain a more accurate main direction for Eigen-Reptile in the presence of many noisy labels, we further propose Introspective Self-paced Learning (ISPL). We have theoretically and experimentally demonstrated the soundness and effectiveness of the proposed Eigen-Reptile and ISPL. Particularly, our experiments on different tasks show that the proposed method is able to outperform or achieve highly competitive performance compared with other gradient-based methods with or without noisy labels. The code and data for the proposed method are provided for research purposes https://github.com/Anfeather/Eigen-Reptile.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22aa.html PDF: https://proceedings.mlr.press/v162/chen22aa/chen22aa.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22aa.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Dong family: Chen - given: Lingfei family: Wu - given: Siliang family: Tang - given: Xiao family: Yun - given: Bo family: Long - given: Yueting family: Zhuang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3662-3678 id: chen22aa issued: date-parts: - 2022 - 6 - 28 firstpage: 3662 lastpage: 3678 published: 2022-06-28 00:00:00 +0000 - title: 'Adaptive Model Design for Markov Decision Process' abstract: 'In a Markov decision process (MDP), an agent interacts with the environment via perceptions and actions. During this process, the agent aims to maximize its own gain. Hence, appropriate regulations are often required, if we hope to take the external costs/benefits of its actions into consideration. In this paper, we study how to regulate such an agent by redesigning model parameters that can affect the rewards and/or the transition kernels. We formulate this problem as a bilevel program, in which the lower-level MDP is regulated by the upper-level model designer. To solve the resulting problem, we develop a scheme that allows the designer to iteratively predict the agent’s reaction by solving the MDP and then adaptively update model parameters based on the predicted reaction. The algorithm is first theoretically analyzed and then empirically tested on several MDP models arising in economics and robotics.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22ab.html PDF: https://proceedings.mlr.press/v162/chen22ab/chen22ab.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22ab.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Siyu family: Chen - given: Donglin family: Yang - given: Jiayang family: Li - given: Senmiao family: Wang - given: Zhuoran family: Yang - given: Zhaoran family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3679-3700 id: chen22ab issued: date-parts: - 2022 - 6 - 28 firstpage: 3679 lastpage: 3700 published: 2022-06-28 00:00:00 +0000 - title: 'State Transition of Dendritic Spines Improves Learning of Sparse Spiking Neural Networks' abstract: 'Spiking Neural Networks (SNNs) are considered a promising alternative to Artificial Neural Networks (ANNs) for their event-driven computing paradigm when deployed on energy-efficient neuromorphic hardware. Recently, deep SNNs have shown breathtaking performance improvement through cutting-edge training strategy and flexible structure, which also scales up the number of parameters and computational burdens in a single network. Inspired by the state transition of dendritic spines in the filopodial model of spinogenesis, we model different states of SNN weights, facilitating weight optimization for pruning. Furthermore, the pruning speed can be regulated by using different functions describing the growing threshold of state transition. We organize these techniques as a dynamic pruning algorithm based on nonlinear reparameterization mapping from spine size to SNN weights. Our approach yields sparse deep networks on the large-scale dataset (SEW ResNet18 on ImageNet) while maintaining state-of-the-art low performance loss ( 3% at 88.8% sparsity) compared to existing pruning methods on directly trained SNNs. Moreover, we find out pruning speed regulation while learning is crucial to avoiding disastrous performance degradation at the final stages of training, which may shed light on future work on SNN pruning.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22ac.html PDF: https://proceedings.mlr.press/v162/chen22ac/chen22ac.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22ac.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yanqi family: Chen - given: Zhaofei family: Yu - given: Wei family: Fang - given: Zhengyu family: Ma - given: Tiejun family: Huang - given: Yonghong family: Tian editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3701-3715 id: chen22ac issued: date-parts: - 2022 - 6 - 28 firstpage: 3701 lastpage: 3715 published: 2022-06-28 00:00:00 +0000 - title: 'Efficient Online ML API Selection for Multi-Label Classification Tasks' abstract: 'Multi-label classification tasks such as OCR and multi-object recognition are a major focus of the growing machine learning as a service industry. While many multi-label APIs are available, it is challenging for users to decide which API to use for their own data and budget, due to the heterogeneity in their prices and performance. Recent work has shown how to efficiently select and combine single label APIs to optimize performance and cost. However, its computation cost is exponential in the number of labels, and is not suitable for settings like OCR. In this work, we propose FrugalMCT, a principled framework that adaptively selects the APIs to use for different data in an online fashion while respecting the user’s budget. It allows combining ML APIs’ predictions for any single data point, and selects the best combination based on an accuracy estimator. We run systematic experiments using ML APIs from Google, Microsoft, Amazon, IBM, Tencent, and other providers for tasks including multi-label image classification, scene text recognition, and named entity recognition. Across these tasks, FrugalMCT can achieve over 90% cost reduction while matching the accuracy of the best single API, or up to 8% better accuracy while matching the best API’s cost.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22ad.html PDF: https://proceedings.mlr.press/v162/chen22ad/chen22ad.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22ad.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Lingjiao family: Chen - given: Matei family: Zaharia - given: James family: Zou editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3716-3746 id: chen22ad issued: date-parts: - 2022 - 6 - 28 firstpage: 3716 lastpage: 3746 published: 2022-06-28 00:00:00 +0000 - title: 'Data-Efficient Double-Win Lottery Tickets from Robust Pre-training' abstract: 'Pre-training serves as a broadly adopted starting point for transfer learning on various downstream tasks. Recent investigations of lottery tickets hypothesis (LTH) demonstrate such enormous pre-trained models can be replaced by extremely sparse subnetworks (a.k.a. matching subnetworks) without sacrificing transferability. However, practical security-crucial applications usually pose more challenging requirements beyond standard transfer, which also demand these subnetworks to overcome adversarial vulnerability. In this paper, we formulate a more rigorous concept, Double-Win Lottery Tickets, in which a located subnetwork from a pre-trained model can be independently transferred on diverse downstream tasks, to reach BOTH the same standard and robust generalization, under BOTH standard and adversarial training regimes, as the full pre-trained model can do. We comprehensively examine various pre-training mechanisms and find that robust pre-training tends to craft sparser double-win lottery tickets with superior performance over the standard counterparts. For example, on downstream CIFAR-10/100 datasets, we identify double-win matching subnetworks with the standard, fast adversarial, and adversarial pre-training from ImageNet, at 89.26%/73.79%, 89.26%/79.03%, and 91.41%/83.22% sparsity, respectively. Furthermore, we observe the obtained double-win lottery tickets can be more data-efficient to transfer, under practical data-limited (e.g., 1% and 10%) downstream schemes. Our results show that the benefits from robust pre-training are amplified by the lottery ticket scheme, as well as the data-limited transfer setting. Codes are available at https://github.com/VITA-Group/Double-Win-LTH.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22ae.html PDF: https://proceedings.mlr.press/v162/chen22ae/chen22ae.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22ae.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tianlong family: Chen - given: Zhenyu family: Zhang - given: Sijia family: Liu - given: Yang family: Zhang - given: Shiyu family: Chang - given: Zhangyang family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3747-3759 id: chen22ae issued: date-parts: - 2022 - 6 - 28 firstpage: 3747 lastpage: 3759 published: 2022-06-28 00:00:00 +0000 - title: 'Linearity Grafting: Relaxed Neuron Pruning Helps Certifiable Robustness' abstract: 'Certifiable robustness is a highly desirable property for adopting deep neural networks (DNNs) in safety-critical scenarios, but often demands tedious computations to establish. The main hurdle lies in the massive amount of non-linearity in large DNNs. To trade off the DNN expressiveness (which calls for more non-linearity) and robustness certification scalability (which prefers more linearity), we propose a novel solution to strategically manipulate neurons, by "grafting" appropriate levels of linearity. The core of our proposal is to first linearize insignificant ReLU neurons, to eliminate the non-linear components that are both redundant for DNN performance and harmful to its certification. We then optimize the associated slopes and intercepts of the replaced linear activations for restoring model performance while maintaining certifiability. Hence, typical neuron pruning could be viewed as a special case of grafting a linear function of the fixed zero slopes and intercept, that might overly restrict the network flexibility and sacrifice its performance. Extensive experiments on multiple datasets and network backbones show that our linearity grafting can (1) effectively tighten certified bounds; (2) achieve competitive certifiable robustness without certified robust training (i.e., over 30% improvements on CIFAR-10 models); and (3) scale up complete verification to large adversarially trained models with 17M parameters. Codes are available at https://github.com/VITA-Group/Linearity-Grafting.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22af.html PDF: https://proceedings.mlr.press/v162/chen22af/chen22af.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22af.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tianlong family: Chen - given: Huan family: Zhang - given: Zhenyu family: Zhang - given: Shiyu family: Chang - given: Sijia family: Liu - given: Pin-Yu family: Chen - given: Zhangyang family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3760-3772 id: chen22af issued: date-parts: - 2022 - 6 - 28 firstpage: 3760 lastpage: 3772 published: 2022-06-28 00:00:00 +0000 - title: 'Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation' abstract: 'We study human-in-the-loop reinforcement learning (RL) with trajectory preferences, where instead of receiving a numeric reward at each step, the RL agent only receives preferences over trajectory pairs from a human overseer. The goal of the RL agent is to learn the optimal policy which is most preferred by the human overseer. Despite the empirical success in various real-world applications, the theoretical understanding of preference-based RL (PbRL) is only limited to the tabular case. In this paper, we propose the first optimistic model-based algorithm for PbRL with general function approximation, which estimates the model using value-targeted regression and calculates the exploratory policies by solving an optimistic planning problem. We prove that our algorithm achieves the regret bound of $\tilde{O} (\operatorname{poly}(d H) \sqrt{K} )$, where $d$ is the complexity measure of the transition and preference model depending on the Eluder dimension and log-covering numbers, $H$ is the planning horizon, $K$ is the number of episodes, and $\tilde O(\cdot)$ omits logarithmic terms. Our lower bound indicates that our algorithm is near-optimal when specialized to the linear setting. Furthermore, we extend the PbRL problem by formulating a novel problem called RL with $n$-wise comparisons, and provide the first sample-efficient algorithm for this new setting. To the best of our knowledge, this is the first theoretical result for PbRL with (general) function approximation.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22ag.html PDF: https://proceedings.mlr.press/v162/chen22ag/chen22ag.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22ag.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xiaoyu family: Chen - given: Han family: Zhong - given: Zhuoran family: Yang - given: Zhaoran family: Wang - given: Liwei family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3773-3793 id: chen22ag issued: date-parts: - 2022 - 6 - 28 firstpage: 3773 lastpage: 3793 published: 2022-06-28 00:00:00 +0000 - title: 'Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis' abstract: 'Actor-critic (AC) algorithms have been widely used in decentralized multi-agent systems to learn the optimal joint control policy. However, existing decentralized AC algorithms either need to share agents’ sensitive information or lack communication-efficiency. In this work, we develop decentralized AC and natural AC (NAC) algorithms that avoid sharing agents’ local information and are sample and communication-efficient. In both algorithms, agents share only noisy rewards and use mini-batch local policy gradient updates to ensure high sample and communication efficiency. Particularly for decentralized NAC, we develop a decentralized Markovian SGD algorithm with an adaptive mini-batch size to efficiently compute the natural policy gradient. Under Markovian sampling and linear function approximation, we prove that the proposed decentralized AC and NAC algorithms achieve the state-of-the-art sample complexities $\mathcal{O}(\epsilon^{-2}\ln\epsilon^{-1})$ and $\mathcal{O}(\epsilon^{-3}\ln\epsilon^{-1})$, respectively, and achieve an improved communication complexity $\mathcal{O}(\epsilon^{-1}\ln\epsilon^{-1})$. Numerical experiments demonstrate that the proposed algorithms achieve lower sample and communication complexities than the existing decentralized AC algorithms.' volume: 162 URL: https://proceedings.mlr.press/v162/chen22ah.html PDF: https://proceedings.mlr.press/v162/chen22ah/chen22ah.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chen22ah.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ziyi family: Chen - given: Yi family: Zhou - given: Rong-Rong family: Chen - given: Shaofeng family: Zou editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3794-3834 id: chen22ah issued: date-parts: - 2022 - 6 - 28 firstpage: 3794 lastpage: 3834 published: 2022-06-28 00:00:00 +0000 - title: 'Task-aware Privacy Preservation for Multi-dimensional Data' abstract: 'Local differential privacy (LDP) can be adopted to anonymize richer user data attributes that will be input to sophisticated machine learning (ML) tasks. However, today’s LDP approaches are largely task-agnostic and often lead to severe performance loss – they simply inject noise to all data attributes according to a given privacy budget, regardless of what features are most relevant for the ultimate task. In this paper, we address how to significantly improve the ultimate task performance with multi-dimensional user data by considering a task-aware privacy preservation problem. The key idea is to use an encoder-decoder framework to learn (and anonymize) a task-relevant latent representation of user data. We obtain an analytical near-optimal solution for the linear setting with mean-squared error (MSE) task loss. We also provide an approximate solution through a gradient-based learning algorithm for general nonlinear cases. Extensive experiments demonstrate that our task-aware approach significantly improves ultimate task accuracy compared to standard benchmark LDP approaches with the same level of privacy guarantee.' volume: 162 URL: https://proceedings.mlr.press/v162/cheng22a.html PDF: https://proceedings.mlr.press/v162/cheng22a/cheng22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-cheng22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jiangnan family: Cheng - given: Ao family: Tang - given: Sandeep family: Chinchali editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3835-3851 id: cheng22a issued: date-parts: - 2022 - 6 - 28 firstpage: 3835 lastpage: 3851 published: 2022-06-28 00:00:00 +0000 - title: 'Adversarially Trained Actor Critic for Offline Reinforcement Learning' abstract: 'We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm for offline reinforcement learning (RL) under insufficient data coverage, based on the concept of relative pessimism. ATAC is designed as a two-player Stackelberg game framing of offline RL: A policy actor competes against an adversarially trained value critic, who finds data-consistent scenarios where the actor is inferior to the data-collection behavior policy. We prove that, when the actor attains no regret in the two-player game, running ATAC produces a policy that provably 1) outperforms the behavior policy over a wide range of hyperparameters that control the degree of pessimism, and 2) competes with the best policy covered by data with appropriately chosen hyperparameters. Compared with existing works, notably our framework offers both theoretical guarantees for general function approximation and a deep RL implementation scalable to complex environments and large datasets. In the D4RL benchmark, ATAC consistently outperforms state-of-the-art offline RL algorithms on a range of continuous control tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/cheng22b.html PDF: https://proceedings.mlr.press/v162/cheng22b/cheng22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-cheng22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ching-An family: Cheng - given: Tengyang family: Xie - given: Nan family: Jiang - given: Alekh family: Agarwal editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3852-3878 id: cheng22b issued: date-parts: - 2022 - 6 - 28 firstpage: 3852 lastpage: 3878 published: 2022-06-28 00:00:00 +0000 - title: 'Quantum-Inspired Algorithms from Randomized Numerical Linear Algebra' abstract: 'We create classical (non-quantum) dynamic data structures supporting queries for recommender systems and least-squares regression that are comparable to their quantum analogues. De-quantizing such algorithms has received a flurry of attention in recent years; we obtain sharper bounds for these problems. More significantly, we achieve these improvements by arguing that the previous quantum-inspired algorithms for these problems are doing leverage or ridge-leverage score sampling in disguise; these are powerful and standard techniques in randomized numerical linear algebra. With this recognition, we are able to employ the large body of work in numerical linear algebra to obtain algorithms for these problems that are simpler or faster (or both) than existing approaches. Our experiments demonstrate that the proposed data structures also work well on real-world datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/chepurko22a.html PDF: https://proceedings.mlr.press/v162/chepurko22a/chepurko22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chepurko22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Nadiia family: Chepurko - given: Kenneth family: Clarkson - given: Lior family: Horesh - given: Honghao family: Lin - given: David family: Woodruff editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3879-3900 id: chepurko22a issued: date-parts: - 2022 - 6 - 28 firstpage: 3879 lastpage: 3900 published: 2022-06-28 00:00:00 +0000 - title: 'RieszNet and ForestRiesz: Automatic Debiased Machine Learning with Neural Nets and Random Forests' abstract: 'Many causal and policy effects of interest are defined by linear functionals of high-dimensional or non-parametric regression functions. $\sqrt{n}$-consistent and asymptotically normal estimation of the object of interest requires debiasing to reduce the effects of regularization and/or model selection on the object of interest. Debiasing is typically achieved by adding a correction term to the plug-in estimator of the functional, which leads to properties such as semi-parametric efficiency, double robustness, and Neyman orthogonality. We implement an automatic debiasing procedure based on automatically learning the Riesz representation of the linear functional using Neural Nets and Random Forests. Our method only relies on black-box evaluation oracle access to the linear functional and does not require knowledge of its analytic form. We propose a multitasking Neural Net debiasing method with stochastic gradient descent minimization of a combined Riesz representer and regression loss, while sharing representation layers for the two functions. We also propose a Random Forest method which learns a locally linear representation of the Riesz function. Even though our method applies to arbitrary functionals, we experimentally find that it performs well compared to the state of art neural net based algorithm of Shi et al. (2019) for the case of the average treatment effect functional. We also evaluate our method on the problem of estimating average marginal effects with continuous treatments, using semi-synthetic data of gasoline price changes on gasoline demand.' volume: 162 URL: https://proceedings.mlr.press/v162/chernozhukov22a.html PDF: https://proceedings.mlr.press/v162/chernozhukov22a/chernozhukov22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chernozhukov22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Victor family: Chernozhukov - given: Whitney family: Newey - given: Vı́ctor M family: Quintas-Martı́nez - given: Vasilis family: Syrgkanis editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3901-3914 id: chernozhukov22a issued: date-parts: - 2022 - 6 - 28 firstpage: 3901 lastpage: 3914 published: 2022-06-28 00:00:00 +0000 - title: 'Self-supervised learning with random-projection quantizer for speech recognition' abstract: 'We present a simple and effective self-supervised learning approach for speech recognition. The approach learns a model to predict the masked speech signals, in the form of discrete labels generated with a random-projection quantizer. In particular the quantizer projects speech inputs with a randomly initialized matrix, and does a nearest-neighbor lookup in a randomly-initialized codebook. Neither the matrix nor the codebook are updated during self-supervised learning. Since the random-projection quantizer is not trained and is separated from the speech recognition model, the design makes the approach flexible and is compatible with universal speech recognition architecture. On LibriSpeech our approach achieves similar word-error-rates as previous work using self-supervised learning with non-streaming models, and provides lower word-error-rates than previous work with streaming models. On multilingual tasks the approach also provides significant improvement over wav2vec 2.0 and w2v-BERT.' volume: 162 URL: https://proceedings.mlr.press/v162/chiu22a.html PDF: https://proceedings.mlr.press/v162/chiu22a/chiu22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chiu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Chung-Cheng family: Chiu - given: James family: Qin - given: Yu family: Zhang - given: Jiahui family: Yu - given: Yonghui family: Wu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3915-3924 id: chiu22a issued: date-parts: - 2022 - 6 - 28 firstpage: 3915 lastpage: 3924 published: 2022-06-28 00:00:00 +0000 - title: 'Discrete Probabilistic Inverse Optimal Transport' abstract: 'Inverse Optimal Transport (IOT) studies the problem of inferring the underlying cost that gives rise to an observation on coupling two probability measures. Couplings appear as the outcome of matching sets (e.g. dating) and moving distributions (e.g. transportation). Compared to Optimal transport (OT), the mathematical theory of IOT is undeveloped. We formalize and systematically analyze the properties of IOT using tools from the study of entropy-regularized OT. Theoretical contributions include characterization of the manifold of cross-ratio equivalent costs, the implications of model priors, and derivation of an MCMC sampler. Empirical contributions include visualizations of cross-ratio equivalent effect on basic examples, simulations validating theoretical results and experiments on real world data.' volume: 162 URL: https://proceedings.mlr.press/v162/chiu22b.html PDF: https://proceedings.mlr.press/v162/chiu22b/chiu22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chiu22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Wei-Ting family: Chiu - given: Pei family: Wang - given: Patrick family: Shafto editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3925-3946 id: chiu22b issued: date-parts: - 2022 - 6 - 28 firstpage: 3925 lastpage: 3946 published: 2022-06-28 00:00:00 +0000 - title: 'Selective Network Linearization for Efficient Private Inference' abstract: 'Private inference (PI) enables inferences directly on cryptographically secure data. While promising to address many privacy issues, it has seen limited use due to extreme runtimes. Unlike plaintext inference, where latency is dominated by FLOPs, in PI non-linear functions (namely ReLU) are the bottleneck. Thus, practical PI demands novel ReLU-aware optimizations. To reduce PI latency we propose a gradient-based algorithm that selectively linearizes ReLUs while maintaining prediction accuracy. We evaluate our algorithm on several standard PI benchmarks. The results demonstrate up to $4.25%$ more accuracy (iso-ReLU count at 50K) or $2.2\times$ less latency (iso-accuracy at 70%) than the current state of the art and advance the Pareto frontier across the latency-accuracy space. To complement empirical results, we present a “no free lunch" theorem that sheds light on how and when network linearization is possible while maintaining prediction accuracy.' volume: 162 URL: https://proceedings.mlr.press/v162/cho22a.html PDF: https://proceedings.mlr.press/v162/cho22a/cho22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-cho22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Minsu family: Cho - given: Ameya family: Joshi - given: Brandon family: Reagen - given: Siddharth family: Garg - given: Chinmay family: Hegde editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3947-3961 id: cho22a issued: date-parts: - 2022 - 6 - 28 firstpage: 3947 lastpage: 3961 published: 2022-06-28 00:00:00 +0000 - title: 'From block-Toeplitz matrices to differential equations on graphs: towards a general theory for scalable masked Transformers' abstract: 'In this paper we provide, to the best of our knowledge, the first comprehensive approach for incorporating various masking mechanisms into Transformers architectures in a scalable way. We show that recent results on linear causal attention (Choromanski et al., 2021) and log-linear RPE-attention (Luo et al., 2021) are special cases of this general mechanism. However by casting the problem as a topological (graph-based) modulation of unmasked attention, we obtain several results unknown before, including efficient d-dimensional RPE-masking and graph-kernel masking. We leverage many mathematical techniques ranging from spectral analysis through dynamic programming and random walks to new algorithms for solving Markov processes on graphs. We provide a corresponding empirical evaluation.' volume: 162 URL: https://proceedings.mlr.press/v162/choromanski22a.html PDF: https://proceedings.mlr.press/v162/choromanski22a/choromanski22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-choromanski22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Krzysztof family: Choromanski - given: Han family: Lin - given: Haoxian family: Chen - given: Tianyi family: Zhang - given: Arijit family: Sehanobish - given: Valerii family: Likhosherstov - given: Jack family: Parker-Holder - given: Tamas family: Sarlos - given: Adrian family: Weller - given: Thomas family: Weingarten editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3962-3983 id: choromanski22a issued: date-parts: - 2022 - 6 - 28 firstpage: 3962 lastpage: 3983 published: 2022-06-28 00:00:00 +0000 - title: 'Shuffle Private Linear Contextual Bandits' abstract: 'Differential privacy (DP) has been recently introduced to linear contextual bandits to formally address the privacy concerns in its associated personalized services to participating users (e.g., recommendations). Prior work largely focus on two trust models of DP – the central model, where a central server is responsible for protecting users’ sensitive data, and the (stronger) local model, where information needs to be protected directly on users’ side. However, there remains a fundamental gap in the utility achieved by learning algorithms under these two privacy models, e.g., if all users are unique within a learning horizon $T$, $\widetilde{O}(\sqrt{T})$ regret in the central model as compared to $\widetilde{O}(T^{3/4})$ regret in the local model. In this work, we aim to achieve a stronger model of trust than the central model, while suffering a smaller regret than the local model by considering recently popular shuffle model of privacy. We propose a general algorithmic framework for linear contextual bandits under the shuffle trust model, where there exists a trusted shuffler – in between users and the central server– that randomly permutes a batch of users data before sending those to the server. We then instantiate this framework with two specific shuffle protocols – one relying on privacy amplification of local mechanisms, and another incorporating a protocol for summing vectors and matrices of bounded norms. We prove that both these instantiations lead to regret guarantees that significantly improve on that of the local model, and can potentially be of the order $\widetilde{O}(T^{3/5})$ if all users are unique. We also verify this regret behavior with simulations on synthetic data. Finally, under the practical scenario of non-unique users, we show that the regret of our shuffle private algorithm scale as $\widetilde{O}(T^{2/3})$, which matches what the central model could achieve in this case.' volume: 162 URL: https://proceedings.mlr.press/v162/chowdhury22a.html PDF: https://proceedings.mlr.press/v162/chowdhury22a/chowdhury22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chowdhury22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sayak Ray family: Chowdhury - given: Xingyu family: Zhou editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 3984-4009 id: chowdhury22a issued: date-parts: - 2022 - 6 - 28 firstpage: 3984 lastpage: 4009 published: 2022-06-28 00:00:00 +0000 - title: 'DNA: Domain Generalization with Diversified Neural Averaging' abstract: 'The inaccessibility of the target domain data causes domain generalization (DG) methods prone to forget target discriminative features, and challenges the pervasive theme in existing literature in pursuing a single classifier with an ideal joint risk. In contrast, this paper investigates model misspecification and attempts to bridge DG with classifier ensemble theoretically and methodologically. By introducing a pruned Jensen-Shannon (PJS) loss, we show that the target square-root risk w.r.t. the PJS loss of the $\rho$-ensemble (the averaged classifier weighted by a quasi-posterior $\rho$) is bounded by the averaged source square-root risk of the Gibbs classifiers. We derive a tighter bound by enforcing a positive principled diversity measure of the classifiers. We give a PAC-Bayes upper bound on the target square-root risk of the $\rho$-ensemble. Methodologically, we propose a diversified neural averaging (DNA) method for DG, which optimizes the proposed PAC-Bayes bound approximately. The DNA method samples Gibbs classifiers transversely and longitudinally by simultaneously considering the dropout variational family and optimization trajectory. The $\rho$-ensemble is approximated by averaging the longitudinal weights in a single run with dropout shut down, ensuring a fast ensemble with low computational overhead. Empirically, the proposed DNA method achieves the state-of-the-art classification performance on standard DG benchmark datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/chu22a.html PDF: https://proceedings.mlr.press/v162/chu22a/chu22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xu family: Chu - given: Yujie family: Jin - given: Wenwu family: Zhu - given: Yasha family: Wang - given: Xin family: Wang - given: Shanghang family: Zhang - given: Hong family: Mei editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4010-4034 id: chu22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4010 lastpage: 4034 published: 2022-06-28 00:00:00 +0000 - title: 'TPC: Transformation-Specific Smoothing for Point Cloud Models' abstract: 'Point cloud models with neural network architectures have achieved great success and been widely used in safety-critical applications, such as Lidar-based recognition systems in autonomous vehicles. However, such models are shown vulnerable against adversarial attacks which aim to apply stealthy semantic transformations such as rotation and tapering to mislead model predictions. In this paper, we propose a transformation-specific smoothing framework TPC, which provides tight and scalable robustness guarantees for point cloud models against semantic transformation attacks. We first categorize common 3D transformations into two categories: composable (e.g., rotation) and indirectly composable (e.g., tapering), and we present generic robustness certification strategies for both categories. We then specify unique certification protocols for a range of specific semantic transformations and derive strong robustness guarantees. Extensive experiments on several common 3D transformations show that TPC significantly outperforms the state of the art. For example, our framework boosts the certified accuracy against twisting transformation along z-axis (within $\pm$20{\textdegree}) from 20.3% to 83.8%. Codes and models are available at https://github.com/Qianhewu/Point-Cloud-Smoothing.' volume: 162 URL: https://proceedings.mlr.press/v162/chu22b.html PDF: https://proceedings.mlr.press/v162/chu22b/chu22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-chu22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Wenda family: Chu - given: Linyi family: Li - given: Bo family: Li editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4035-4056 id: chu22b issued: date-parts: - 2022 - 6 - 28 firstpage: 4035 lastpage: 4056 published: 2022-06-28 00:00:00 +0000 - title: 'Unified Scaling Laws for Routed Language Models' abstract: 'The performance of a language model has been shown to be effectively modeled as a power-law in its parameter count. Here we study the scaling behaviors of Routing Networks: architectures that conditionally use only a subset of their parameters while processing an input. For these models, parameter count and computational requirement form two independent axes along which an increase leads to better performance. In this work we derive and justify scaling laws defined on these two variables which generalize those known for standard language models and describe the performance of a wide range of routing architectures trained via three different techniques. Afterwards we provide two applications of these laws: first deriving an Effective Parameter Count along which all models scale at the same rate, and then using the scaling coefficients to give a quantitative comparison of the three routing techniques considered. Our analysis derives from an extensive evaluation of Routing Networks across five orders of magnitude of size, including models with hundreds of experts and hundreds of billions of parameters.' volume: 162 URL: https://proceedings.mlr.press/v162/clark22a.html PDF: https://proceedings.mlr.press/v162/clark22a/clark22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-clark22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Aidan family: Clark - given: Diego family: De Las Casas - given: Aurelia family: Guy - given: Arthur family: Mensch - given: Michela family: Paganini - given: Jordan family: Hoffmann - given: Bogdan family: Damoc - given: Blake family: Hechtman - given: Trevor family: Cai - given: Sebastian family: Borgeaud - given: George Bm family: Van Den Driessche - given: Eliza family: Rutherford - given: Tom family: Hennigan - given: Matthew J family: Johnson - given: Albin family: Cassirer - given: Chris family: Jones - given: Elena family: Buchatskaya - given: David family: Budden - given: Laurent family: Sifre - given: Simon family: Osindero - given: Oriol family: Vinyals - given: Marc’Aurelio family: Ranzato - given: Jack family: Rae - given: Erich family: Elsen - given: Koray family: Kavukcuoglu - given: Karen family: Simonyan editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4057-4086 id: clark22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4057 lastpage: 4086 published: 2022-06-28 00:00:00 +0000 - title: 'Context-Aware Drift Detection' abstract: 'When monitoring machine learning systems, two-sample tests of homogeneity form the foundation upon which existing approaches to drift detection build. They are used to test for evidence that the distribution underlying recent deployment data differs from that underlying the historical reference data. Often, however, various factors such as time-induced correlation mean that batches of recent deployment data are not expected to form an i.i.d. sample from the historical data distribution. Instead we may wish to test for differences in the distributions conditional on context that is permitted to change. To facilitate this we borrow machinery from the causal inference domain to develop a more general drift detection framework built upon a foundation of two-sample tests for conditional distributional treatment effects. We recommend a particular instantiation of the framework based on maximum conditional mean discrepancies. We then provide an empirical study demonstrating its effectiveness for various drift detection problems of practical interest, such as detecting drift in the distributions underlying subpopulations of data in a manner that is insensitive to their respective prevalences. The study additionally demonstrates applicability to ImageNet-scale vision problems.' volume: 162 URL: https://proceedings.mlr.press/v162/cobb22a.html PDF: https://proceedings.mlr.press/v162/cobb22a/cobb22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-cobb22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Oliver family: Cobb - given: Arnaud family: Van Looveren editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4087-4111 id: cobb22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4087 lastpage: 4111 published: 2022-06-28 00:00:00 +0000 - title: 'On the Robustness of CountSketch to Adaptive Inputs' abstract: 'The last decade saw impressive progress towards understanding the performance of algorithms in adaptive settings, where subsequent inputs may depend on the output from prior inputs. Adaptive settings arise in processes with feedback or with adversarial attacks. Existing designs of robust algorithms are generic wrappers of non-robust counterparts and leave open the possibility of better tailored designs. The lowers bounds (attacks) are similarly worst-case and their significance to practical setting is unclear. Aiming to understand these questions, we study the robustness of \texttt{CountSketch}, a popular dimensionality reduction technique that maps vectors to a lower dimension using randomized linear measurements. The sketch supports recovering $\ell_2$-heavy hitters of a vector (entries with $v[i]^2 \geq \frac{1}{k}\|\boldsymbol{v}\|^2_2$). We show that the classic estimator is not robust, and can be attacked with a number of queries of the order of the sketch size. We propose a robust estimator (for a slightly modified sketch) that allows for quadratic number of queries in the sketch size, which is an improvement factor of $\sqrt{k}$ (for $k$ heavy hitters) over prior "blackbox" approaches.' volume: 162 URL: https://proceedings.mlr.press/v162/cohen22a.html PDF: https://proceedings.mlr.press/v162/cohen22a/cohen22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-cohen22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Edith family: Cohen - given: Xin family: Lyu - given: Jelani family: Nelson - given: Tamas family: Sarlos - given: Moshe family: Shechner - given: Uri family: Stemmer editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4112-4140 id: cohen22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4112 lastpage: 4140 published: 2022-06-28 00:00:00 +0000 - title: 'Diffusion bridges vector quantized variational autoencoders' abstract: 'Vector Quantized-Variational AutoEncoders (VQ-VAE) are generative models based on discrete latent representations of the data, where inputs are mapped to a finite set of learned embeddings. To generate new samples, an autoregressive prior distribution over the discrete states must be trained separately. This prior is generally very complex and leads to slow generation. In this work, we propose a new model to train the prior and the encoder/decoder networks simultaneously. We build a diffusion bridge between a continuous coded vector and a non-informative prior distribution. The latent discrete states are then given as random functions of these continuous vectors. We show that our model is competitive with the autoregressive prior on the mini-Imagenet and CIFAR dataset and is efficient in both optimization and sampling. Our framework also extends the standard VQ-VAE and enables end-to-end training.' volume: 162 URL: https://proceedings.mlr.press/v162/cohen22b.html PDF: https://proceedings.mlr.press/v162/cohen22b/cohen22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-cohen22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Max family: Cohen - given: Guillaume family: Quispe - given: Sylvain Le family: Corff - given: Charles family: Ollion - given: Eric family: Moulines editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4141-4156 id: cohen22b issued: date-parts: - 2022 - 6 - 28 firstpage: 4141 lastpage: 4156 published: 2022-06-28 00:00:00 +0000 - title: 'Online and Consistent Correlation Clustering' abstract: 'In the correlation clustering problem the input is a signed graph where the sign indicates whether each pair of points should be placed in the same cluster or not. The goal of the problem is to compute a clustering which minimizes the number of disagreements with such recommendation. Thanks to its many practical applications, correlation clustering is a fundamental unsupervised learning problem and has been extensively studied in many different settings. In this paper we study the problem in the classic online setting with recourse; The vertices of the graphs arrive in an online manner and the goal is to maintain an approximate clustering while minimizing the number of times each vertex changes cluster. Our main contribution is an algorithm that achieves logarithmic recourse per vertex in the worst case. We also complement this result with a tight lower bound. Finally we show experimentally that our algorithm achieves better performances than state-of-the-art algorithms on real world data.' volume: 162 URL: https://proceedings.mlr.press/v162/cohen-addad22a.html PDF: https://proceedings.mlr.press/v162/cohen-addad22a/cohen-addad22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-cohen-addad22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Vincent family: Cohen-Addad - given: Silvio family: Lattanzi - given: Andreas family: Maggiori - given: Nikos family: Parotsidis editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4157-4179 id: cohen-addad22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4157 lastpage: 4179 published: 2022-06-28 00:00:00 +0000 - title: 'Massively Parallel $k$-Means Clustering for Perturbation Resilient Instances' abstract: 'We consider $k$-means clustering of $n$ data points in Euclidean space in the Massively Parallel Computation (MPC) model, a computational model which is an abstraction of modern massively parallel computing system such as MapReduce. Recent work provides evidence that getting $O(1)$-approximate $k$-means solution for general input points using $o(\log n)$ rounds in the MPC model may be impossible under certain conditions [Ghaffari, Kuhn \& Uitto’2019]. However, the real-world data points usually have better structures. One instance of interest is the set of data points which is perturbation resilient [Bilu \& Linial’2010]. In particular, a point set is $\alpha$-perturbation resilient for $k$-means if perturbing pairwise distances by multiplicative factors in the range $[1,\alpha]$ does not change the optimum $k$-means clusters. We bypass the worst case lower bound by considering the perturbation resilient input points and showing $o(\log n)$ rounds $k$-means clustering algorithms for these instances in the MPC model. Specifically, we show a fully scalable $(1+\varepsilon)$-approximate $k$-means clustering algorithm for $O(\alpha)$-perturbation resilient instance in the MPC model using $O(1)$ rounds and ${O}_{\varepsilon,d}(n^{1+1/\alpha^2+o(1)})$ total space. If the space per machine is sufficiently larger than $k$, i.e., at least $k\cdot n^{\Omega(1)}$, we also develop an optimal $k$-means clustering algorithm for $O(\alpha)$-perturbation resilient instance in MPC using $O(1)$ rounds and ${O}_d(n^{1+o(1)}\cdot(n^{1/\alpha^2}+k))$ total space.' volume: 162 URL: https://proceedings.mlr.press/v162/cohen-addad22b.html PDF: https://proceedings.mlr.press/v162/cohen-addad22b/cohen-addad22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-cohen-addad22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Vincent family: Cohen-Addad - given: Vahab family: Mirrokni - given: Peilin family: Zhong editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4180-4201 id: cohen-addad22b issued: date-parts: - 2022 - 6 - 28 firstpage: 4180 lastpage: 4201 published: 2022-06-28 00:00:00 +0000 - title: 'One-Pass Diversified Sampling with Application to Terabyte-Scale Genomic Sequence Streams' abstract: 'A popular approach to reduce the size of a massive dataset is to apply efficient online sampling to the stream of data as it is read or generated. Online sampling routines are currently restricted to variations of reservoir sampling, where each sample is selected uniformly and independently of other samples. This renders them unsuitable for large-scale applications in computational biology, such as metagenomic community profiling and protein function annotation, which suffer from severe class imbalance. To maintain a representative and diverse sample, we must identify and preferentially select data that are likely to belong to rare classes. We argue that existing schemes for diversity sampling have prohibitive overhead for large-scale problems and high-throughput streams. We propose an efficient sampling routine that uses an online representation of the data distribution as a prefilter to retain elements from rare groups. We apply this method to several genomic data analysis tasks and demonstrate significant speedup in downstream analysis without sacrificing the quality of the results. Because our algorithm is 2x faster and uses 1000x less memory than coreset, reservoir and sketch-based alternatives, we anticipate that it will become a useful preprocessing step for applications with large-scale streaming data.' volume: 162 URL: https://proceedings.mlr.press/v162/coleman22a.html PDF: https://proceedings.mlr.press/v162/coleman22a/coleman22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-coleman22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Benjamin family: Coleman - given: Benito family: Geordie - given: Li family: Chou - given: R. A. Leo family: Elworth - given: Todd family: Treangen - given: Anshumali family: Shrivastava editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4202-4218 id: coleman22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4202 lastpage: 4218 published: 2022-06-28 00:00:00 +0000 - title: 'Transfer and Marginalize: Explaining Away Label Noise with Privileged Information' abstract: 'Supervised learning datasets often have privileged information, in the form of features which are available at training time but are not available at test time e.g. the ID of the annotator that provided the label. We argue that privileged information is useful for explaining away label noise, thereby reducing the harmful impact of noisy labels. We develop a simple and efficient method for supervised learning with neural networks: it transfers via weight sharing the knowledge learned with privileged information and approximately marginalizes over privileged information at test time. Our method, TRAM (TRansfer and Marginalize), has minimal training time overhead and has the same test-time cost as not using privileged information. TRAM performs strongly on CIFAR-10H, ImageNet and Civil Comments benchmarks.' volume: 162 URL: https://proceedings.mlr.press/v162/collier22a.html PDF: https://proceedings.mlr.press/v162/collier22a/collier22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-collier22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mark family: Collier - given: Rodolphe family: Jenatton - given: Effrosyni family: Kokiopoulou - given: Jesse family: Berent editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4219-4237 id: collier22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4219 lastpage: 4237 published: 2022-06-28 00:00:00 +0000 - title: 'MAML and ANIL Provably Learn Representations' abstract: 'Recent empirical evidence has driven conventional wisdom to believe that gradient-based meta-learning (GBML) methods perform well at few-shot learning because they learn an expressive data representation that is shared across tasks. However, the mechanics of GBML have remained largely mysterious from a theoretical perspective. In this paper, we prove that two well-known GBML methods, MAML and ANIL, as well as their first-order approximations, are capable of learning common representation among a set of given tasks. Specifically, in the well-known multi-task linear representation learning setting, they are able to recover the ground-truth representation at an exponentially fast rate. Moreover, our analysis illuminates that the driving force causing MAML and ANIL to recover the underlying representation is that they adapt the final layer of their model, which harnesses the underlying task diversity to improve the representation in all directions of interest. To the best of our knowledge, these are the first results to show that MAML and/or ANIL learn expressive representations and to rigorously explain why they do so.' volume: 162 URL: https://proceedings.mlr.press/v162/collins22a.html PDF: https://proceedings.mlr.press/v162/collins22a/collins22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-collins22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Liam family: Collins - given: Aryan family: Mokhtari - given: Sewoong family: Oh - given: Sanjay family: Shakkottai editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4238-4310 id: collins22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4238 lastpage: 4310 published: 2022-06-28 00:00:00 +0000 - title: 'Entropic Causal Inference: Graph Identifiability' abstract: 'Entropic causal inference is a recent framework for learning the causal graph between two variables from observational data by finding the information-theoretically simplest structural explanation of the data, i.e., the model with smallest entropy. In our work, we first extend the causal graph identifiability result in the two-variable setting under relaxed assumptions. We then show the first identifiability result using the entropic approach for learning causal graphs with more than two nodes. Our approach utilizes the property that ancestrality between a source node and its descendants can be determined using the bivariate entropic tests. We provide a sound sequential peeling algorithm for general graphs that relies on this property. We also propose a heuristic algorithm for small graphs that shows strong empirical performance. We rigorously evaluate the performance of our algorithms on synthetic data generated from a variety of models, observing improvement over prior work. Finally we test our algorithms on real-world datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/compton22a.html PDF: https://proceedings.mlr.press/v162/compton22a/compton22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-compton22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Spencer family: Compton - given: Kristjan family: Greenewald - given: Dmitriy A family: Katz - given: Murat family: Kocaoglu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4311-4343 id: compton22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4311 lastpage: 4343 published: 2022-06-28 00:00:00 +0000 - title: 'Mitigating Gender Bias in Face Recognition using the von Mises-Fisher Mixture Model' abstract: 'In spite of the high performance and reliability of deep learning algorithms in a wide range of everyday applications, many investigations tend to show that a lot of models exhibit biases, discriminating against specific subgroups of the population (e.g. gender, ethnicity). This urges the practitioner to develop fair systems with a uniform/comparable performance across sensitive groups. In this work, we investigate the gender bias of deep Face Recognition networks. In order to measure this bias, we introduce two new metrics, BFAR and BFRR, that better reflect the inherent deployment needs of Face Recognition systems. Motivated by geometric considerations, we mitigate gender bias through a new post-processing methodology which transforms the deep embeddings of a pre-trained model to give more representation power to discriminated subgroups. It consists in training a shallow neural network by minimizing a Fair von Mises-Fisher loss whose hyperparameters account for the intra-class variance of each gender. Interestingly, we empirically observe that these hyperparameters are correlated with our fairness metrics. In fact, extensive numerical experiments on a variety of datasets show that a careful selection significantly reduces gender bias.' volume: 162 URL: https://proceedings.mlr.press/v162/conti22a.html PDF: https://proceedings.mlr.press/v162/conti22a/conti22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-conti22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jean-Rémy family: Conti - given: Nathan family: Noiry - given: Stephan family: Clemencon - given: Vincent family: Despiegel - given: Stéphane family: Gentric editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4344-4369 id: conti22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4344 lastpage: 4369 published: 2022-06-28 00:00:00 +0000 - title: 'Counterfactual Transportability: A Formal Approach' abstract: 'Generalizing causal knowledge across environments is a common challenge shared across many of the data-driven disciplines, including AI and ML. Experiments are usually performed in one environment (e.g., in a lab, on Earth, in a training ground), almost invariably, with the intent of being used elsewhere (e.g., outside the lab, on Mars, in the real world), in an environment that is related but somewhat different than the original one, where certain conditions and mechanisms are likely to change. This generalization task has been studied in the causal inference literature under the rubric of transportability (Pearl and Bareinboim, 2011). While most transportability works focused on generalizing associational and interventional distributions, the generalization of counterfactual distributions has not been formally studied. In this paper, we investigate the transportability of counterfactuals from an arbitrary combination of observational and experimental distributions coming from disparate domains. Specifically, we introduce a sufficient and necessary graphical condition and develop an efficient, sound, and complete algorithm for transporting counterfactual quantities across domains in nonparametric settings. Failure of the algorithm implies the impossibility of generalizing the target counterfactual from the available data without further assumptions.' volume: 162 URL: https://proceedings.mlr.press/v162/correa22a.html PDF: https://proceedings.mlr.press/v162/correa22a/correa22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-correa22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Juan D family: Correa - given: Sanghack family: Lee - given: Elias family: Bareinboim editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4370-4390 id: correa22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4370 lastpage: 4390 published: 2022-06-28 00:00:00 +0000 - title: 'Label-Free Explainability for Unsupervised Models' abstract: 'Unsupervised black-box models are challenging to interpret. Indeed, most existing explainability methods require labels to select which component(s) of the black-box’s output to interpret. In the absence of labels, black-box outputs often are representation vectors whose components do not correspond to any meaningful quantity. Hence, choosing which component(s) to interpret in a label-free unsupervised/self-supervised setting is an important, yet unsolved problem. To bridge this gap in the literature, we introduce two crucial extensions of post-hoc explanation techniques: (1) label-free feature importance and (2) label-free example importance that respectively highlight influential features and training examples for a black-box to construct representations at inference time. We demonstrate that our extensions can be successfully implemented as simple wrappers around many existing feature and example importance methods. We illustrate the utility of our label-free explainability paradigm through a qualitative and quantitative comparison of representation spaces learned by various autoencoders trained on distinct unsupervised tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/crabbe22a.html PDF: https://proceedings.mlr.press/v162/crabbe22a/crabbe22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-crabbe22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jonathan family: Crabbé - given: Mihaela prefix: van der family: Schaar editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4391-4420 id: crabbe22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4391 lastpage: 4420 published: 2022-06-28 00:00:00 +0000 - title: 'Evaluating the Adversarial Robustness of Adaptive Test-time Defenses' abstract: 'Adaptive defenses, which optimize at test time, promise to improve adversarial robustness. We categorize such adaptive test-time defenses, explain their potential benefits and drawbacks, and evaluate a representative variety of the latest adaptive defenses for image classification. Unfortunately, none significantly improve upon static defenses when subjected to our careful case study evaluation. Some even weaken the underlying static model while simultaneously increasing inference computation. While these results are disappointing, we still believe that adaptive test-time defenses are a promising avenue of research and, as such, we provide recommendations for their thorough evaluation. We extend the checklist of Carlini et al. (2019) by providing concrete steps specific to adaptive defenses.' volume: 162 URL: https://proceedings.mlr.press/v162/croce22a.html PDF: https://proceedings.mlr.press/v162/croce22a/croce22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-croce22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Francesco family: Croce - given: Sven family: Gowal - given: Thomas family: Brunner - given: Evan family: Shelhamer - given: Matthias family: Hein - given: Taylan family: Cemgil editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4421-4435 id: croce22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4421 lastpage: 4435 published: 2022-06-28 00:00:00 +0000 - title: 'Adversarial Robustness against Multiple and Single $l_p$-Threat Models via Quick Fine-Tuning of Robust Classifiers' abstract: 'A major drawback of adversarially robust models, in particular for large scale datasets like ImageNet, is the extremely long training time compared to standard models. Moreover, models should be robust not only to one $l_p$-threat model but ideally to all of them. In this paper we propose Extreme norm Adversarial Training (E-AT) for multiple-norm robustness which is based on geometric properties of $l_p$-balls. E-AT costs up to three times less than other adversarial training methods for multiple-norm robustness. Using E-AT we show that for ImageNet a single epoch and for CIFAR-10 three epochs are sufficient to turn any $l_p$-robust model into a multiple-norm robust model. In this way we get the first multiple-norm robust model for ImageNet and boost the state-of-the-art for multiple-norm robustness to more than $51%$ on CIFAR-10. Finally, we study the general transfer via fine-tuning of adversarial robustness between different individual $l_p$-threat models and improve the previous SOTA $l_1$-robustness on both CIFAR-10 and ImageNet. Extensive experiments show that our scheme works across datasets and architectures including vision transformers.' volume: 162 URL: https://proceedings.mlr.press/v162/croce22b.html PDF: https://proceedings.mlr.press/v162/croce22b/croce22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-croce22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Francesco family: Croce - given: Matthias family: Hein editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4436-4454 id: croce22b issued: date-parts: - 2022 - 6 - 28 firstpage: 4436 lastpage: 4454 published: 2022-06-28 00:00:00 +0000 - title: 'Self-conditioning Pre-Trained Language Models' abstract: 'In this paper we aim to investigate the mechanisms that guide text generation with pre-trained Transformer-based Language Models (TLMs). Grounded on the Product of Experts formulation by Hinton (1999), we describe a generative mechanism that exploits expert units which naturally exist in TLMs. Such units are responsible for detecting concepts in the input and conditioning text generation on such concepts. We describe how to identify expert units and how to activate them during inference in order to induce any desired concept in the generated output. We find that the activation of a surprisingly small amount of units is sufficient to steer text generation (as little as 3 units in a model with 345M parameters). While the objective of this work is to learn more about how TLMs work, we show that our method is effective for conditioning without fine-tuning or using extra parameters, even on fine-grained homograph concepts. Additionally, we show that our method can be used to correct gender bias present in the output of TLMs and achieves gender parity for all evaluated contexts. We compare our method with FUDGE and PPLM-BoW, and show that our approach is able to achieve gender parity at a lower perplexity and better Self-BLEU score. The proposed method is accessible to a wide audience thanks to its simplicity and minimal compute needs. The findings in this paper are a step forward in understanding the generative mechanisms of TLMs.' volume: 162 URL: https://proceedings.mlr.press/v162/cuadros22a.html PDF: https://proceedings.mlr.press/v162/cuadros22a/cuadros22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-cuadros22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xavier Suau family: Cuadros - given: Luca family: Zappella - given: Nicholas family: Apostoloff editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4455-4473 id: cuadros22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4455 lastpage: 4473 published: 2022-06-28 00:00:00 +0000 - title: 'Only tails matter: Average-Case Universality and Robustness in the Convex Regime' abstract: 'The recently developed average-case analysis of optimization methods allows a more fine-grained and representative convergence analysis than usual worst-case results. In exchange, this analysis requires a more precise hypothesis over the data generating process, namely assuming knowledge of the expected spectral distribution (ESD) of the random matrix associated with the problem. This work shows that the concentration of eigenvalues near the edges of the ESD determines a problem’s asymptotic average complexity. This a priori information on this concentration is a more grounded assumption than complete knowledge of the ESD. This approximate concentration is effectively a middle ground between the coarseness of the worst-case scenario convergence and the restrictive previous average-case analysis. We also introduce the Generalized Chebyshev method, asymptotically optimal under a hypothesis on this concentration and globally optimal when the ESD follows a Beta distribution. We compare its performance to classical optimization algorithms, such as gradient descent or Nesterov’s scheme, and we show that, in the average-case context, Nesterov’s method is universally nearly optimal asymptotically.' volume: 162 URL: https://proceedings.mlr.press/v162/cunha22a.html PDF: https://proceedings.mlr.press/v162/cunha22a/cunha22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-cunha22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Leonardo family: Cunha - given: Gauthier family: Gidel - given: Fabian family: Pedregosa - given: Damien family: Scieur - given: Courtney family: Paquette editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4474-4491 id: cunha22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4474 lastpage: 4491 published: 2022-06-28 00:00:00 +0000 - title: 'Principal Component Flows' abstract: 'Normalizing flows map an independent set of latent variables to their samples using a bijective transformation. Despite the exact correspondence between samples and latent variables, their high level relationship is not well understood. In this paper we characterize the geometric structure of flows using principal manifolds and understand the relationship between latent variables and samples using contours. We introduce a novel class of normalizing flows, called principal component flows (PCF), whose contours are its principal manifolds, and a variant for injective flows (iPCF) that is more efficient to train than regular injective flows. PCFs can be constructed using any flow architecture, are trained with a regularized maximum likelihood objective and can perform density estimation on all of their principal manifolds. In our experiments we show that PCFs and iPCFs are able to learn the principal manifolds over a variety of datasets. Additionally, we show that PCFs can perform density estimation on data that lie on a manifold with variable dimensionality, which is not possible with existing normalizing flows.' volume: 162 URL: https://proceedings.mlr.press/v162/cunningham22a.html PDF: https://proceedings.mlr.press/v162/cunningham22a/cunningham22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-cunningham22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Edmond family: Cunningham - given: Adam D family: Cobb - given: Susmit family: Jha editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4492-4519 id: cunningham22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4492 lastpage: 4519 published: 2022-06-28 00:00:00 +0000 - title: 'Deep symbolic regression for recurrence prediction' abstract: 'Symbolic regression, i.e. predicting a function from the observation of its values, is well-known to be a challenging task. In this paper, we train Transformers to infer the function or recurrence relation underlying sequences of integers or floats, a typical task in human IQ tests which has hardly been tackled in the machine learning literature. We evaluate our integer model on a subset of OEIS sequences, and show that it outperforms built-in Mathematica functions for recurrence prediction. We also demonstrate that our float model is able to yield informative approximations of out-of-vocabulary functions and constants, e.g. $\operatorname{bessel0}(x)\approx \frac{\sin(x)+\cos(x)}{\sqrt{\pi x}}$ and $1.644934\approx \pi^2/6$.' volume: 162 URL: https://proceedings.mlr.press/v162/d-ascoli22a.html PDF: https://proceedings.mlr.press/v162/d-ascoli22a/d-ascoli22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-d-ascoli22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Stéphane family: D’Ascoli - given: Pierre-Alexandre family: Kamienny - given: Guillaume family: Lample - given: Francois family: Charton editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4520-4536 id: d-ascoli22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4520 lastpage: 4536 published: 2022-06-28 00:00:00 +0000 - title: 'Continuous Control with Action Quantization from Demonstrations' abstract: 'In this paper, we propose a novel Reinforcement Learning (RL) framework for problems with continuous action spaces: Action Quantization from Demonstrations (AQuaDem). The proposed approach consists in learning a discretization of continuous action spaces from human demonstrations. This discretization returns a set of plausible actions (in light of the demonstrations) for each input state, thus capturing the priors of the demonstrator and their multimodal behavior. By discretizing the action space, any discrete action deep RL technique can be readily applied to the continuous control problem. Experiments show that the proposed approach outperforms state-of-the-art methods such as SAC in the RL setup, and GAIL in the Imitation Learning setup. We provide a website with interactive videos: https://google-research.github.io/aquadem/ and make the code available: https://github.com/google-research/google-research/tree/master/aquadem.' volume: 162 URL: https://proceedings.mlr.press/v162/dadashi22a.html PDF: https://proceedings.mlr.press/v162/dadashi22a/dadashi22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-dadashi22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Robert family: Dadashi - given: Léonard family: Hussenot - given: Damien family: Vincent - given: Sertan family: Girgin - given: Anton family: Raichuk - given: Matthieu family: Geist - given: Olivier family: Pietquin editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4537-4557 id: dadashi22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4537 lastpage: 4557 published: 2022-06-28 00:00:00 +0000 - title: 'Dialog Inpainting: Turning Documents into Dialogs' abstract: 'Many important questions (e.g. "How to eat healthier?") require conversation to establish context and explore in depth. However, conversational question answering (ConvQA) systems have long been stymied by scarce training data that is expensive to collect. To address this problem, we propose a new technique for synthetically generating diverse and high-quality dialog data: dialog inpainting. Our approach takes the text of any document and transforms it into a two-person dialog between the writer and an imagined reader: we treat sentences from the article as utterances spoken by the writer, and then use a dialog inpainter to predict what the imagined reader asked or said in between each of the writer’s utterances. By applying this approach to passages from Wikipedia and the web, we produce WikiDialog and WebDialog, two datasets totalling 19 million diverse information-seeking dialogs – 1,000x larger than the largest existing ConvQA dataset. Furthermore, human raters judge the answer adequacy and conversationality of WikiDialog to be as good or better than existing manually-collected datasets. Remarkably, our approach shows strong zero-shot capability, generating high quality synthetic data without using any in-domain ConvQA data. Using our inpainted data to pre-train ConvQA retrieval systems, we significantly advance state-of-the-art across three benchmarks (QReCC, OR-QuAC, TREC CAsT) yielding up to 40% relative gains on standard evaluation metrics.' volume: 162 URL: https://proceedings.mlr.press/v162/dai22a.html PDF: https://proceedings.mlr.press/v162/dai22a/dai22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-dai22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhuyun family: Dai - given: Arun Tejasvi family: Chaganty - given: Vincent Y family: Zhao - given: Aida family: Amini - given: Qazi Mamunur family: Rashid - given: Mike family: Green - given: Kelvin family: Guu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4558-4586 id: dai22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4558 lastpage: 4586 published: 2022-06-28 00:00:00 +0000 - title: 'DisPFL: Towards Communication-Efficient Personalized Federated Learning via Decentralized Sparse Training' abstract: 'Personalized federated learning is proposed to handle the data heterogeneity problem amongst clients by learning dedicated tailored local models for each user. However, existing works are often built in a centralized way, leading to high communication pressure and high vulnerability when a failure or an attack on the central server occurs. In this work, we propose a novel personalized federated learning framework in a decentralized (peer-to-peer) communication protocol named DisPFL, which employs personalized sparse masks to customize sparse local models on the edge. To further save the communication and computation cost, we propose a decentralized sparse training technique, which means that each local model in DisPFL only maintains a fixed number of active parameters throughout the whole local training and peer-to-peer communication process. Comprehensive experiments demonstrate that DisPFL significantly saves the communication bottleneck for the busiest node among all clients and, at the same time, achieves higher model accuracy with less computation cost and communication rounds. Furthermore, we demonstrate that our method can easily adapt to heterogeneous local clients with varying computation complexities and achieves better personalized performances.' volume: 162 URL: https://proceedings.mlr.press/v162/dai22b.html PDF: https://proceedings.mlr.press/v162/dai22b/dai22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-dai22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Rong family: Dai - given: Li family: Shen - given: Fengxiang family: He - given: Xinmei family: Tian - given: Dacheng family: Tao editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4587-4604 id: dai22b issued: date-parts: - 2022 - 6 - 28 firstpage: 4587 lastpage: 4604 published: 2022-06-28 00:00:00 +0000 - title: 'Marginal Distribution Adaptation for Discrete Sets via Module-Oriented Divergence Minimization' abstract: 'Distributions over discrete sets capture the essential statistics including the high-order correlation among elements. Such information provides powerful insight for decision making across various application domains, e.g., product assortment based on product distribution in shopping carts. While deep generative models trained on pre-collected data can capture existing distributions, such pre-trained models are usually not capable of aligning with a target domain in the presence of distribution shift due to reasons such as temporal shift or the change in the population mix. We develop a general framework to adapt a generative model subject to a (possibly counterfactual) target data distribution with both sampling and computation efficiency. Concretely, instead of re-training a full model from scratch, we reuse the learned modules to preserve the correlations between set elements, while only adjusting corresponding components to align with target marginal constraints. We instantiate the approach for three commonly used forms of discrete set distribution—latent variable, autoregressive, and energy based models—and provide efficient solutions for marginal-constrained optimization in either primal or dual forms. Experiments on both synthetic and real-world e-commerce and EHR datasets show that the proposed framework is able to practically align a generative model to match marginal constraints under distribution shift.' volume: 162 URL: https://proceedings.mlr.press/v162/dai22c.html PDF: https://proceedings.mlr.press/v162/dai22c/dai22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-dai22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hanjun family: Dai - given: Mengjiao family: Yang - given: Yuan family: Xue - given: Dale family: Schuurmans - given: Bo family: Dai editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4605-4617 id: dai22c issued: date-parts: - 2022 - 6 - 28 firstpage: 4605 lastpage: 4617 published: 2022-06-28 00:00:00 +0000 - title: 'Balancing Sample Efficiency and Suboptimality in Inverse Reinforcement Learning' abstract: 'We propose a novel formulation for the Inverse Reinforcement Learning (IRL) problem, which jointly accounts for the compatibility with the expert behavior of the identified reward and its effectiveness for the subsequent forward learning phase. Albeit quite natural, especially when the final goal is apprenticeship learning (learning policies from an expert), this aspect has been completely overlooked by IRL approaches so far. We propose a new model-free IRL method that is remarkably able to autonomously find a trade-off between the error induced on the learned policy when potentially choosing a sub-optimal reward, and the estimation error caused by using finite samples in the forward learning phase, which can be controlled by explicitly optimizing also the discount factor of the related learning problem. The approach is based on a min-max formulation for the robust selection of the reward parameters and the discount factor so that the distance between the expert’s policy and the learned policy is minimized in the successive forward learning task when a finite and possibly small number of samples is available. Differently from the majority of other IRL techniques, our approach does not involve any planning or forward Reinforcement Learning problems to be solved. After presenting the formulation, we provide a numerical scheme for the optimization, and we show its effectiveness on an illustrative numerical case.' volume: 162 URL: https://proceedings.mlr.press/v162/damiani22a.html PDF: https://proceedings.mlr.press/v162/damiani22a/damiani22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-damiani22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Angelo family: Damiani - given: Giorgio family: Manganini - given: Alberto Maria family: Metelli - given: Marcello family: Restelli editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4618-4629 id: damiani22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4618 lastpage: 4629 published: 2022-06-28 00:00:00 +0000 - title: 'Understanding Robust Generalization in Learning Regular Languages' abstract: 'A key feature of human intelligence is the ability to generalize beyond the training distribution, for instance, parsing longer sentences than seen in the past. Currently, deep neural networks struggle to generalize robustly to such shifts in the data distribution. We study robust generalization in the context of using recurrent neural networks (RNNs) to learn regular languages. We hypothesize that standard end-to-end modeling strategies cannot generalize well to systematic distribution shifts and propose a compositional strategy to address this. We compare an end-to-end strategy that maps strings to labels with a compositional strategy that predicts the structure of the deterministic finite state automaton (DFA) that accepts the regular language. We theoretically prove that the compositional strategy generalizes significantly better than the end-to-end strategy. In our experiments, we implement the compositional strategy via an auxiliary task where the goal is to predict the intermediate states visited by the DFA when parsing a string. Our empirical results support our hypothesis, showing that auxiliary tasks can enable robust generalization. Interestingly, the end-to-end RNN generalizes significantly better than the theoretical lower bound, suggesting that it is able to achieve atleast some degree of robust generalization.' volume: 162 URL: https://proceedings.mlr.press/v162/dan22a.html PDF: https://proceedings.mlr.press/v162/dan22a/dan22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-dan22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Soham family: Dan - given: Osbert family: Bastani - given: Dan family: Roth editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4630-4643 id: dan22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4630 lastpage: 4643 published: 2022-06-28 00:00:00 +0000 - title: 'Unsupervised Image Representation Learning with Deep Latent Particles' abstract: 'We propose a new representation of visual data that disentangles object position from appearance. Our method, termed Deep Latent Particles (DLP), decomposes the visual input into low-dimensional latent “particles”, where each particle is described by its spatial location and features of its surrounding region. To drive learning of such representations, we follow a VAE-based based approach and introduce a prior for particle positions based on a spatial-Softmax architecture, and a modification of the evidence lower bound loss inspired by the Chamfer distance between particles. We demonstrate that our DLP representations are useful for downstream tasks such as unsupervised keypoint (KP) detection, image manipulation, and video prediction for scenes composed of multiple dynamic objects. In addition, we show that our probabilistic interpretation of the problem naturally provides uncertainty estimates for particle locations, which can be used for model selection, among other tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/daniel22a.html PDF: https://proceedings.mlr.press/v162/daniel22a/daniel22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-daniel22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tal family: Daniel - given: Aviv family: Tamar editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4644-4665 id: daniel22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4644 lastpage: 4665 published: 2022-06-28 00:00:00 +0000 - title: 'Guarantees for Epsilon-Greedy Reinforcement Learning with Function Approximation' abstract: 'Myopic exploration policies such as epsilon-greedy, softmax, or Gaussian noise fail to explore efficiently in some reinforcement learning tasks and yet, they perform well in many others. In fact, in practice, they are often selected as the top choices, due to their simplicity. But, for what tasks do such policies succeed? Can we give theoretical guarantees for their favorable performance? These crucial questions have been scarcely investigated, despite the prominent practical importance of these policies. This paper presents a theoretical analysis of such policies and provides the first regret and sample-complexity bounds for reinforcement learning with myopic exploration. Our results apply to value-function-based algorithms in episodic MDPs with bounded Bellman Eluder dimension. We propose a new complexity measure called myopic exploration gap, denoted by alpha, that captures a structural property of the MDP, the exploration policy and the given value function class. We show that the sample-complexity of myopic exploration scales quadratically with the inverse of this quantity, 1 / alpha^2. We further demonstrate through concrete examples that myopic exploration gap is indeed favorable in several tasks where myopic exploration succeeds, due to the corresponding dynamics and reward structure.' volume: 162 URL: https://proceedings.mlr.press/v162/dann22a.html PDF: https://proceedings.mlr.press/v162/dann22a/dann22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-dann22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Chris family: Dann - given: Yishay family: Mansour - given: Mehryar family: Mohri - given: Ayush family: Sekhari - given: Karthik family: Sridharan editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4666-4689 id: dann22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4666 lastpage: 4689 published: 2022-06-28 00:00:00 +0000 - title: 'Monarch: Expressive Structured Matrices for Efficient and Accurate Training' abstract: 'Large neural networks excel in many domains, but they are expensive to train and fine-tune. A popular approach to reduce their compute or memory requirements is to replace dense weight matrices with structured ones (e.g., sparse, low-rank, Fourier transform). These methods have not seen widespread adoption (1) in end-to-end training due to unfavorable efficiency–quality tradeoffs, and (2) in dense-to-sparse fine-tuning due to lack of tractable algorithms to approximate a given dense weight matrix. To address these issues, we propose a class of matrices (Monarch) that is hardware-efficient (they are parameterized as products of two block-diagonal matrices for better hardware utilization) and expressive (they can represent many commonly used transforms). Surprisingly, the problem of approximating a dense weight matrix with a Monarch matrix, though nonconvex, has an analytical optimal solution. These properties of Monarch matrices unlock new ways to train and fine-tune sparse and dense models. We empirically validate that Monarch can achieve favorable accuracy-efficiency tradeoffs in several end-to-end sparse training applications: speeding up ViT and GPT-2 training on ImageNet classification and Wikitext-103 language modeling by 2x with comparable model quality, and reducing the error on PDE solving and MRI reconstruction tasks by 40%. In sparse-to-dense training, with a simple technique called "reverse sparsification," Monarch matrices serve as a useful intermediate representation to speed up GPT-2 pretraining on OpenWebText by 2x without quality drop. The same technique brings 23% faster BERT pretraining than even the very optimized implementation from Nvidia that set the MLPerf 1.1 record. In dense-to-sparse fine-tuning, as a proof-of-concept, our Monarch approximation algorithm speeds up BERT fine-tuning on GLUE by 1.7x with comparable accuracy.' volume: 162 URL: https://proceedings.mlr.press/v162/dao22a.html PDF: https://proceedings.mlr.press/v162/dao22a/dao22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-dao22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tri family: Dao - given: Beidi family: Chen - given: Nimit S family: Sohoni - given: Arjun family: Desai - given: Michael family: Poli - given: Jessica family: Grogan - given: Alexander family: Liu - given: Aniruddh family: Rao - given: Atri family: Rudra - given: Christopher family: Re editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4690-4721 id: dao22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4690 lastpage: 4721 published: 2022-06-28 00:00:00 +0000 - title: 'Score-Guided Intermediate Level Optimization: Fast Langevin Mixing for Inverse Problems' abstract: 'We prove fast mixing and characterize the stationary distribution of the Langevin Algorithm for inverting random weighted DNN generators. This result extends the work of Hand and Voroninski from efficient inversion to efficient posterior sampling. In practice, to allow for increased expressivity, we propose to do posterior sampling in the latent space of a pre-trained generative model. To achieve that, we train a score-based model in the latent space of a StyleGAN-2 and we use it to solve inverse problems. Our framework, Score-Guided Intermediate Layer Optimization (SGILO), extends prior work by replacing the sparsity regularization with a generative prior in the intermediate layer. Experimentally, we obtain significant improvements over the previous state-of-the-art, especially in the low measurement regime.' volume: 162 URL: https://proceedings.mlr.press/v162/daras22a.html PDF: https://proceedings.mlr.press/v162/daras22a/daras22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-daras22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Giannis family: Daras - given: Yuval family: Dagan - given: Alex family: Dimakis - given: Constantinos family: Daskalakis editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4722-4753 id: daras22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4722 lastpage: 4753 published: 2022-06-28 00:00:00 +0000 - title: 'Test-Time Training Can Close the Natural Distribution Shift Performance Gap in Deep Learning Based Compressed Sensing' abstract: 'Deep learning based image reconstruction methods outperform traditional methods. However, neural networks suffer from a performance drop when applied to images from a different distribution than the training images. For example, a model trained for reconstructing knees in accelerated magnetic resonance imaging (MRI) does not reconstruct brains well, even though the same network trained on brains reconstructs brains perfectly well. Thus there is a distribution shift performance gap for a given neural network, defined as the difference in performance when training on a distribution $P$ and training on another distribution $Q$, and evaluating both models on $Q$. In this work, we propose a domain adaptation method for deep learning based compressive sensing that relies on self-supervision during training paired with test-time training at inference. We show that for four natural distribution shifts, this method essentially closes the distribution shift performance gap for state-of-the-art architectures for accelerated MRI.' volume: 162 URL: https://proceedings.mlr.press/v162/darestani22a.html PDF: https://proceedings.mlr.press/v162/darestani22a/darestani22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-darestani22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mohammad Zalbagi family: Darestani - given: Jiayu family: Liu - given: Reinhard family: Heckel editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4754-4776 id: darestani22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4754 lastpage: 4776 published: 2022-06-28 00:00:00 +0000 - title: 'Knowledge Base Question Answering by Case-based Reasoning over Subgraphs' abstract: 'Question answering (QA) over knowledge bases (KBs) is challenging because of the diverse, essentially unbounded, types of reasoning patterns needed. However, we hypothesize in a large KB, reasoning patterns required to answer a query type reoccur for various entities in their respective subgraph neighborhoods. Leveraging this structural similarity between local neighborhoods of different subgraphs, we introduce a semiparametric model (CBR-SUBG) with (i) a nonparametric component that for each query, dynamically retrieves other similar $k$-nearest neighbor (KNN) training queries along with query-specific subgraphs and (ii) a parametric component that is trained to identify the (latent) reasoning patterns from the subgraphs of KNN queries and then apply them to the subgraph of the target query. We also propose an adaptive subgraph collection strategy to select a query-specific compact subgraph, allowing us to scale to full Freebase KB containing billions of facts. We show that CBR-SUBG can answer queries requiring subgraph reasoning patterns and performs competitively with the best models on several KBQA benchmarks. Our subgraph collection strategy also produces more compact subgraphs (e.g. 55% reduction in size for WebQSP while increasing answer recall by 4.85%)\footnote{Code, model, and subgraphs are available at \url{https://github.com/rajarshd/CBR-SUBG}}.' volume: 162 URL: https://proceedings.mlr.press/v162/das22a.html PDF: https://proceedings.mlr.press/v162/das22a/das22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-das22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Rajarshi family: Das - given: Ameya family: Godbole - given: Ankita family: Naik - given: Elliot family: Tower - given: Manzil family: Zaheer - given: Hannaneh family: Hajishirzi - given: Robin family: Jia - given: Andrew family: Mccallum editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4777-4793 id: das22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4777 lastpage: 4793 published: 2022-06-28 00:00:00 +0000 - title: 'Framework for Evaluating Faithfulness of Local Explanations' abstract: 'We study the faithfulness of an explanation system to the underlying prediction model. We show that this can be captured by two properties, consistency and sufficiency, and introduce quantitative measures of the extent to which these hold. Interestingly, these measures depend on the test-time data distribution. For a variety of existing explanation systems, such as anchors, we analytically study these quantities. We also provide estimators and sample complexity bounds for empirically determining the faithfulness of black-box explanation systems. Finally, we experimentally validate the new properties and estimators.' volume: 162 URL: https://proceedings.mlr.press/v162/dasgupta22a.html PDF: https://proceedings.mlr.press/v162/dasgupta22a/dasgupta22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-dasgupta22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sanjoy family: Dasgupta - given: Nave family: Frost - given: Michal family: Moshkovitz editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4794-4815 id: dasgupta22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4794 lastpage: 4815 published: 2022-06-28 00:00:00 +0000 - title: 'Distinguishing rule and exemplar-based generalization in learning systems' abstract: 'Machine learning systems often do not share the same inductive biases as humans and, as a result, extrapolate or generalize in ways that are inconsistent with our expectations. The trade-off between exemplar- and rule-based generalization has been studied extensively in cognitive psychology; in this work, we present a protocol inspired by these experimental approaches to probe the inductive biases that control this trade-off in category-learning systems such as artificial neural networks. We isolate two such inductive biases: feature-level bias (differences in which features are more readily learned) and exemplar-vs-rule bias (differences in how these learned features are used for generalization of category labels). We find that standard neural network models are feature-biased and have a propensity towards exemplar-based extrapolation; we discuss the implications of these findings for machine-learning research on data augmentation, fairness, and systematic generalization.' volume: 162 URL: https://proceedings.mlr.press/v162/dasgupta22b.html PDF: https://proceedings.mlr.press/v162/dasgupta22b/dasgupta22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-dasgupta22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ishita family: Dasgupta - given: Erin family: Grant - given: Tom family: Griffiths editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4816-4830 id: dasgupta22b issued: date-parts: - 2022 - 6 - 28 firstpage: 4816 lastpage: 4830 published: 2022-06-28 00:00:00 +0000 - title: 'Robust Multi-Objective Bayesian Optimization Under Input Noise' abstract: 'Bayesian optimization (BO) is a sample-efficient approach for tuning design parameters to optimize expensive-to-evaluate, black-box performance metrics. In many manufacturing processes, the design parameters are subject to random input noise, resulting in a product that is often less performant than expected. Although BO methods have been proposed for optimizing a single objective under input noise, no existing method addresses the practical scenario where there are multiple objectives that are sensitive to input perturbations. In this work, we propose the first multi-objective BO method that is robust to input noise. We formalize our goal as optimizing the multivariate value-at-risk (MVaR), a risk measure of the uncertain objectives. Since directly optimizing MVaR is computationally infeasible in many settings, we propose a scalable, theoretically-grounded approach for optimizing MVaR using random scalarizations. Empirically, we find that our approach significantly outperforms alternative methods and efficiently identifies optimal robust designs that will satisfy specifications across multiple metrics with high probability.' volume: 162 URL: https://proceedings.mlr.press/v162/daulton22a.html PDF: https://proceedings.mlr.press/v162/daulton22a/daulton22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-daulton22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Samuel family: Daulton - given: Sait family: Cakmak - given: Maximilian family: Balandat - given: Michael A. family: Osborne - given: Enlu family: Zhou - given: Eytan family: Bakshy editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4831-4866 id: daulton22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4831 lastpage: 4866 published: 2022-06-28 00:00:00 +0000 - title: 'Attentional Meta-learners for Few-shot Polythetic Classification' abstract: 'Polythetic classifications, based on shared patterns of features that need neither be universal nor constant among members of a class, are common in the natural world and greatly outnumber monothetic classifications over a set of features. We show that threshold meta-learners, such as Prototypical Networks, require an embedding dimension that is exponential in the number of task-relevant features to emulate these functions. In contrast, attentional classifiers, such as Matching Networks, are polythetic by default and able to solve these problems with a linear embedding dimension. However, we find that in the presence of task-irrelevant features, inherent to meta-learning problems, attentional models are susceptible to misclassification. To address this challenge, we propose a self-attention feature-selection mechanism that adaptively dilutes non-discriminative features. We demonstrate the effectiveness of our approach in meta-learning Boolean functions, and synthetic and real-world few-shot learning tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/day22a.html PDF: https://proceedings.mlr.press/v162/day22a/day22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-day22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ben J family: Day - given: Ramon Viñas family: Torné - given: Nikola family: Simidjievski - given: Pietro family: Lió editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4867-4889 id: day22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4867 lastpage: 4889 published: 2022-06-28 00:00:00 +0000 - title: 'Adversarial Vulnerability of Randomized Ensembles' abstract: 'Despite the tremendous success of deep neural networks across various tasks, their vulnerability to imperceptible adversarial perturbations has hindered their deployment in the real world. Recently, works on randomized ensembles have empirically demonstrated significant improvements in adversarial robustness over standard adversarially trained (AT) models with minimal computational overhead, making them a promising solution for safety-critical resource-constrained applications. However, this impressive performance raises the question: Are these robustness gains provided by randomized ensembles real? In this work we address this question both theoretically and empirically. We first establish theoretically that commonly employed robustness evaluation methods such as adaptive PGD provide a false sense of security in this setting. Subsequently, we propose a theoretically-sound and efficient adversarial attack algorithm (ARC) capable of compromising random ensembles even in cases where adaptive PGD fails to do so. We conduct comprehensive experiments across a variety of network architectures, training schemes, datasets, and norms to support our claims, and empirically establish that randomized ensembles are in fact more vulnerable to $\ell_p$-bounded adversarial perturbations than even standard AT models. Our code can be found at https://github.com/hsndbk4/ARC.' volume: 162 URL: https://proceedings.mlr.press/v162/dbouk22a.html PDF: https://proceedings.mlr.press/v162/dbouk22a/dbouk22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-dbouk22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hassan family: Dbouk - given: Naresh family: Shanbhag editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4890-4917 id: dbouk22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4890 lastpage: 4917 published: 2022-06-28 00:00:00 +0000 - title: 'Born-Infeld (BI) for AI: Energy-Conserving Descent (ECD) for Optimization' abstract: 'We introduce a novel framework for optimization based on energy-conserving Hamiltonian dynamics in a strongly mixing (chaotic) regime and establish its key properties analytically and numerically. The prototype is a discretization of Born-Infeld dynamics, with a squared relativistic speed limit depending on the objective function. This class of frictionless, energy-conserving optimizers proceeds unobstructed until slowing naturally near the minimal loss, which dominates the phase space volume of the system. Building from studies of chaotic systems such as dynamical billiards, we formulate a specific algorithm with good performance on machine learning and PDE-solving tasks, including generalization. It cannot stop at a high local minimum, an advantage in non-convex loss functions, and proceeds faster than GD+momentum in shallow valleys.' volume: 162 URL: https://proceedings.mlr.press/v162/de-luca22a.html PDF: https://proceedings.mlr.press/v162/de-luca22a/de-luca22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-de-luca22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Giuseppe Bruno family: De Luca - given: Eva family: Silverstein editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4918-4936 id: de-luca22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4918 lastpage: 4936 published: 2022-06-28 00:00:00 +0000 - title: 'Error-driven Input Modulation: Solving the Credit Assignment Problem without a Backward Pass' abstract: 'Supervised learning in artificial neural networks typically relies on backpropagation, where the weights are updated based on the error-function gradients and sequentially propagated from the output layer to the input layer. Although this approach has proven effective in a wide domain of applications, it lacks biological plausibility in many regards, including the weight symmetry problem, the dependence of learning on non-local signals, the freezing of neural activity during error propagation, and the update locking problem. Alternative training schemes have been introduced, including sign symmetry, feedback alignment, and direct feedback alignment, but they invariably rely on a backward pass that hinders the possibility of solving all the issues simultaneously. Here, we propose to replace the backward pass with a second forward pass in which the input signal is modulated based on the error of the network. We show that this novel learning rule comprehensively addresses all the above-mentioned issues and can be applied to both fully connected and convolutional models. We test this learning rule on MNIST, CIFAR-10, and CIFAR-100. These results help incorporate biological principles into machine learning.' volume: 162 URL: https://proceedings.mlr.press/v162/dellaferrera22a.html PDF: https://proceedings.mlr.press/v162/dellaferrera22a/dellaferrera22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-dellaferrera22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Giorgia family: Dellaferrera - given: Gabriel family: Kreiman editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4937-4955 id: dellaferrera22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4937 lastpage: 4955 published: 2022-06-28 00:00:00 +0000 - title: 'DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations' abstract: 'Reconstruction-based Model-Based Reinforcement Learning (MBRL) agents, such as Dreamer, often fail to discard task-irrelevant visual distractions that are prevalent in natural scenes. In this paper, we propose a reconstruction-free MBRL agent, called DreamerPro, that can enhance robustness to distractions. Motivated by the recent success of prototypical representations, a non-contrastive self-supervised learning approach in computer vision, DreamerPro combines Dreamer with prototypes. In order for the prototypes to benefit temporal dynamics learning in MBRL, we propose to additionally learn the prototypes from the recurrent states of the world model, thereby distilling temporal structures from past observations and actions into the prototypes. Experiments on the DeepMind Control suite show that DreamerPro achieves better overall performance than state-of-the-art contrastive MBRL agents when there are complex background distractions, and maintains similar performance as Dreamer in standard tasks where contrastive MBRL agents can perform much worse.' volume: 162 URL: https://proceedings.mlr.press/v162/deng22a.html PDF: https://proceedings.mlr.press/v162/deng22a/deng22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-deng22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Fei family: Deng - given: Ingook family: Jang - given: Sungjin family: Ahn editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4956-4975 id: deng22a issued: date-parts: - 2022 - 6 - 28 firstpage: 4956 lastpage: 4975 published: 2022-06-28 00:00:00 +0000 - title: 'NeuralEF: Deconstructing Kernels by Deep Neural Networks' abstract: 'Learning the principal eigenfunctions of an integral operator defined by a kernel and a data distribution is at the core of many machine learning problems. Traditional nonparametric solutions based on the Nystrom formula suffer from scalability issues. Recent work has resorted to a parametric approach, i.e., training neural networks to approximate the eigenfunctions. However, the existing method relies on an expensive orthogonalization step and is difficult to implement. We show that these problems can be fixed by using a new series of objective functions that generalizes the EigenGame to function space. We test our method on a variety of supervised and unsupervised learning problems and show it provides accurate approximations to the eigenfunctions of polynomial, radial basis, neural network Gaussian process, and neural tangent kernels. Finally, we demonstrate our method can scale up linearised Laplace approximation of deep neural networks to modern image classification datasets through approximating the Gauss-Newton matrix. Code is available at https://github.com/thudzj/neuraleigenfunction.' volume: 162 URL: https://proceedings.mlr.press/v162/deng22b.html PDF: https://proceedings.mlr.press/v162/deng22b/deng22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-deng22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhijie family: Deng - given: Jiaxin family: Shi - given: Jun family: Zhu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4976-4992 id: deng22b issued: date-parts: - 2022 - 6 - 28 firstpage: 4976 lastpage: 4992 published: 2022-06-28 00:00:00 +0000 - title: 'Deep Causal Metric Learning' abstract: 'Deep metric learning aims to learn distance metrics that measure similarities and dissimilarities between samples. The existing approaches typically focus on designing different hard sample mining or distance margin strategies and then minimize a pair/triplet-based or proxy-based loss over the training data. However, this can lead the model to recklessly learn all the correlated distances found in training data including the spurious distance (e.g., background differences) that is not the distance of interest and can harm the generalization of the learned metric. To address this issue, we study metric learning from a causality perspective and accordingly propose deep causal metric learning (DCML) that pursues the true causality of the distance between samples. DCML is achieved through explicitly learning environment-invariant attention and task-invariant embedding based on causal inference. Extensive experiments on several benchmark datasets demonstrate the superiority of DCML over the existing methods.' volume: 162 URL: https://proceedings.mlr.press/v162/deng22c.html PDF: https://proceedings.mlr.press/v162/deng22c/deng22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-deng22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xiang family: Deng - given: Zhongfei family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 4993-5006 id: deng22c issued: date-parts: - 2022 - 6 - 28 firstpage: 4993 lastpage: 5006 published: 2022-06-28 00:00:00 +0000 - title: 'On the Convergence of Inexact Predictor-Corrector Methods for Linear Programming' abstract: 'Interior point methods (IPMs) are a common approach for solving linear programs (LPs) with strong theoretical guarantees and solid empirical performance. The time complexity of these methods is dominated by the cost of solving a linear system of equations at each iteration. In common applications of linear programming, particularly in machine learning and scientific computing, the size of this linear system can become prohibitively large, requiring the use of iterative solvers, which provide an approximate solution to the linear system. However, approximately solving the linear system at each iteration of an IPM invalidates the theoretical guarantees of common IPM analyses. To remedy this, we theoretically and empirically analyze (slightly modified) predictor-corrector IPMs when using approximate linear solvers: our approach guarantees that, when certain conditions are satisfied, the number of IPM iterations does not increase and that the final solution remains feasible. We also provide practical instantiations of approximate linear solvers that satisfy these conditions for special classes of constraint matrices using randomized linear algebra.' volume: 162 URL: https://proceedings.mlr.press/v162/dexter22a.html PDF: https://proceedings.mlr.press/v162/dexter22a/dexter22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-dexter22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Gregory family: Dexter - given: Agniva family: Chowdhury - given: Haim family: Avron - given: Petros family: Drineas editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5007-5038 id: dexter22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5007 lastpage: 5038 published: 2022-06-28 00:00:00 +0000 - title: 'Analysis of Stochastic Processes through Replay Buffers' abstract: 'Replay buffers are a key component in many reinforcement learning schemes. Yet, their theoretical properties are not fully understood. In this paper we analyze a system where a stochastic process X is pushed into a replay buffer and then randomly sampled to generate a stochastic process Y from the replay buffer. We provide an analysis of the properties of the sampled process such as stationarity, Markovity and autocorrelation in terms of the properties of the original process. Our theoretical analysis sheds light on why replay buffer may be a good de-correlator. Our analysis provides theoretical tools for proving the convergence of replay buffer based algorithms which are prevalent in reinforcement learning schemes.' volume: 162 URL: https://proceedings.mlr.press/v162/di-castro22a.html PDF: https://proceedings.mlr.press/v162/di-castro22a/di-castro22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-di-castro22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Shirli family: Di-Castro - given: Shie family: Mannor - given: Dotan Di family: Castro editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5039-5060 id: di-castro22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5039 lastpage: 5060 published: 2022-06-28 00:00:00 +0000 - title: 'Streaming Algorithms for High-Dimensional Robust Statistics' abstract: 'We study high-dimensional robust statistics tasks in the streaming model. A recent line of work obtained computationally efficient algorithms for a range of high-dimensional robust statistics tasks. Unfortunately, all previous algorithms require storing the entire dataset, incurring memory at least quadratic in the dimension. In this work, we develop the first efficient streaming algorithms for high-dimensional robust statistics with near-optimal memory requirements (up to logarithmic factors). Our main result is for the task of high-dimensional robust mean estimation in (a strengthening of) Huber’s contamination model. We give an efficient single-pass streaming algorithm for this task with near-optimal error guarantees and space complexity nearly-linear in the dimension. As a corollary, we obtain streaming algorithms with near-optimal space complexity for several more complex tasks, including robust covariance estimation, robust regression, and more generally robust stochastic optimization.' volume: 162 URL: https://proceedings.mlr.press/v162/diakonikolas22a.html PDF: https://proceedings.mlr.press/v162/diakonikolas22a/diakonikolas22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-diakonikolas22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ilias family: Diakonikolas - given: Daniel M. family: Kane - given: Ankit family: Pensia - given: Thanasis family: Pittas editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5061-5117 id: diakonikolas22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5061 lastpage: 5117 published: 2022-06-28 00:00:00 +0000 - title: 'Learning General Halfspaces with Adversarial Label Noise via Online Gradient Descent' abstract: 'We study the problem of learning general {—} i.e., not necessarily homogeneous {—} halfspaces with adversarial label noise under the Gaussian distribution. Prior work has provided a sophisticated polynomial-time algorithm for this problem. In this work, we show that the problem can be solved directly via online gradient descent applied to a sequence of natural non-convex surrogates. This approach yields a simple iterative learning algorithm for general halfspaces with near-optimal sample complexity, runtime, and error guarantee. At the conceptual level, our work establishes an intriguing connection between learning halfspaces with adversarial noise and online optimization that may find other applications.' volume: 162 URL: https://proceedings.mlr.press/v162/diakonikolas22b.html PDF: https://proceedings.mlr.press/v162/diakonikolas22b/diakonikolas22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-diakonikolas22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ilias family: Diakonikolas - given: Vasilis family: Kontonis - given: Christos family: Tzamos - given: Nikos family: Zarifis editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5118-5141 id: diakonikolas22b issued: date-parts: - 2022 - 6 - 28 firstpage: 5118 lastpage: 5141 published: 2022-06-28 00:00:00 +0000 - title: 'Variational Feature Pyramid Networks' abstract: 'Recent architectures for object detection adopt a Feature Pyramid Network as a backbone for deep feature extraction. Many works focus on the design of pyramid networks which produce richer feature representations. In this work, we opt to learn a dataset-specific architecture for Feature Pyramid Networks. With the proposed method, the network fuses features at multiple scales, it is efficient in terms of parameters and operations, and yields better results across a variety of tasks and datasets. Starting by a complex network, we adopt Variational Inference to prune redundant connections. Our model, integrated with standard detectors, outperforms the state-of-the-art feature fusion networks.' volume: 162 URL: https://proceedings.mlr.press/v162/dimitrakopoulos22a.html PDF: https://proceedings.mlr.press/v162/dimitrakopoulos22a/dimitrakopoulos22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-dimitrakopoulos22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Panagiotis family: Dimitrakopoulos - given: Giorgos family: Sfikas - given: Christophoros family: Nikou editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5142-5152 id: dimitrakopoulos22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5142 lastpage: 5152 published: 2022-06-28 00:00:00 +0000 - title: 'Understanding Doubly Stochastic Clustering' abstract: 'The problem of projecting a matrix onto the space of doubly stochastic matrices finds several applications in machine learning. For example, in spectral clustering, it has been shown that forming the normalized Laplacian matrix from a data affinity matrix has close connections to projecting it onto the set of doubly stochastic matrices. However, the analysis of why this projection improves clustering has been limited. In this paper we present theoretical conditions on the given affinity matrix under which its doubly stochastic projection is an ideal affinity matrix (i.e., it has no false connections between clusters, and is well-connected within each cluster). In particular, we show that a necessary and sufficient condition for a projected affinity matrix to be ideal reduces to a set of conditions on the input affinity that decompose along each cluster. Further, in the subspace clustering problem, where each cluster is defined by a linear subspace, we provide geometric conditions on the underlying subspaces which guarantee correct clustering via a continuous version of the problem. This allows us to explain theoretically the remarkable performance of a recently proposed doubly stochastic subspace clustering method.' volume: 162 URL: https://proceedings.mlr.press/v162/ding22a.html PDF: https://proceedings.mlr.press/v162/ding22a/ding22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ding22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tianjiao family: Ding - given: Derek family: Lim - given: Rene family: Vidal - given: Benjamin D family: Haeffele editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5153-5165 id: ding22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5153 lastpage: 5165 published: 2022-06-28 00:00:00 +0000 - title: 'Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence' abstract: 'We examine global non-asymptotic convergence properties of policy gradient methods for multi-agent reinforcement learning (RL) problems in Markov potential games (MPGs). To learn a Nash equilibrium of an MPG in which the size of state space and/or the number of players can be very large, we propose new independent policy gradient algorithms that are run by all players in tandem. When there is no uncertainty in the gradient evaluation, we show that our algorithm finds an $\epsilon$-Nash equilibrium with $O(1/\epsilon^2)$ iteration complexity which does not explicitly depend on the state space size. When the exact gradient is not available, we establish $O(1/\epsilon^5)$ sample complexity bound in a potentially infinitely large state space for a sample-based algorithm that utilizes function approximation. Moreover, we identify a class of independent policy gradient algorithms that enjoy convergence for both zero-sum Markov games and Markov cooperative games with the players that are oblivious to the types of games being played. Finally, we provide computational experiments to corroborate the merits and the effectiveness of our theoretical developments.' volume: 162 URL: https://proceedings.mlr.press/v162/ding22b.html PDF: https://proceedings.mlr.press/v162/ding22b/ding22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ding22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Dongsheng family: Ding - given: Chen-Yu family: Wei - given: Kaiqing family: Zhang - given: Mihailo family: Jovanovic editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5166-5220 id: ding22b issued: date-parts: - 2022 - 6 - 28 firstpage: 5166 lastpage: 5220 published: 2022-06-28 00:00:00 +0000 - title: 'Generalization and Robustness Implications in Object-Centric Learning' abstract: 'The idea behind object-centric representation learning is that natural scenes can better be modeled as compositions of objects and their relations as opposed to distributed representations. This inductive bias can be injected into neural networks to potentially improve systematic generalization and performance of downstream tasks in scenes with multiple objects. In this paper, we train state-of-the-art unsupervised models on five common multi-object datasets and evaluate segmentation metrics and downstream object property prediction. In addition, we study generalization and robustness by investigating the settings where either a single object is out of distribution – e.g., having an unseen color, texture, or shape – or global properties of the scene are altered – e.g., by occlusions, cropping, or increasing the number of objects. From our experimental study, we find object-centric representations to be useful for downstream tasks and generally robust to most distribution shifts affecting objects. However, when the distribution shift affects the input in a less structured manner, robustness in terms of segmentation and downstream task performance may vary significantly across models and distribution shifts.' volume: 162 URL: https://proceedings.mlr.press/v162/dittadi22a.html PDF: https://proceedings.mlr.press/v162/dittadi22a/dittadi22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-dittadi22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Andrea family: Dittadi - given: Samuele S family: Papa - given: Michele family: De Vita - given: Bernhard family: Schölkopf - given: Ole family: Winther - given: Francesco family: Locatello editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5221-5285 id: dittadi22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5221 lastpage: 5285 published: 2022-06-28 00:00:00 +0000 - title: 'Fair Generalized Linear Models with a Convex Penalty' abstract: 'Despite recent advances in algorithmic fairness, methodologies for achieving fairness with generalized linear models (GLMs) have yet to be explored in general, despite GLMs being widely used in practice. In this paper we introduce two fairness criteria for GLMs based on equalizing expected outcomes or log-likelihoods. We prove that for GLMs both criteria can be achieved via a convex penalty term based solely on the linear components of the GLM, thus permitting efficient optimization. We also derive theoretical properties for the resulting fair GLM estimator. To empirically demonstrate the efficacy of the proposed fair GLM, we compare it with other well-known fair prediction methods on an extensive set of benchmark datasets for binary classification and regression. In addition, we demonstrate that the fair GLM can generate fair predictions for a range of response variables, other than binary and continuous outcomes.' volume: 162 URL: https://proceedings.mlr.press/v162/do22a.html PDF: https://proceedings.mlr.press/v162/do22a/do22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-do22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hyungrok family: Do - given: Preston family: Putzel - given: Axel S family: Martin - given: Padhraic family: Smyth - given: Judy family: Zhong editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5286-5308 id: do22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5286 lastpage: 5308 published: 2022-06-28 00:00:00 +0000 - title: 'Bayesian Learning with Information Gain Provably Bounds Risk for a Robust Adversarial Defense' abstract: 'We present a new algorithm to learn a deep neural network model robust against adversarial attacks. Previous algorithms demonstrate an adversarially trained Bayesian Neural Network (BNN) provides improved robustness. We recognize the learning approach for approximating the multi-modal posterior distribution of an adversarially trained Bayesian model can lead to mode collapse; consequently, the model’s achievements in robustness and performance are sub-optimal. Instead, we first propose preventing mode collapse to better approximate the multi-modal posterior distribution. Second, based on the intuition that a robust model should ignore perturbations and only consider the informative content of the input, we conceptualize and formulate an information gain objective to measure and force the information learned from both benign and adversarial training instances to be similar. Importantly. we prove and demonstrate that minimizing the information gain objective allows the adversarial risk to approach the conventional empirical risk. We believe our efforts provide a step towards a basis for a principled method of adversarially training BNNs. Our extensive experimental results demonstrate significantly improved robustness up to 20% compared with adversarial training and Adv-BNN under PGD attacks with 0.035 distortion on both CIFAR-10 and STL-10 dataset.' volume: 162 URL: https://proceedings.mlr.press/v162/doan22a.html PDF: https://proceedings.mlr.press/v162/doan22a/doan22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-doan22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Bao Gia family: Doan - given: Ehsan M. family: Abbasnejad - given: Javen Qinfeng family: Shi - given: Damith C. family: Ranasinghe editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5309-5323 id: doan22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5309 lastpage: 5323 published: 2022-06-28 00:00:00 +0000 - title: 'On the Adversarial Robustness of Causal Algorithmic Recourse' abstract: 'Algorithmic recourse seeks to provide actionable recommendations for individuals to overcome unfavorable classification outcomes from automated decision-making systems. Recourse recommendations should ideally be robust to reasonably small uncertainty in the features of the individual seeking recourse. In this work, we formulate the adversarially robust recourse problem and show that recourse methods that offer minimally costly recourse fail to be robust. We then present methods for generating adversarially robust recourse for linear and for differentiable classifiers. Finally, we show that regularizing the decision-making classifier to behave locally linearly and to rely more strongly on actionable features facilitates the existence of adversarially robust recourse.' volume: 162 URL: https://proceedings.mlr.press/v162/dominguez-olmedo22a.html PDF: https://proceedings.mlr.press/v162/dominguez-olmedo22a/dominguez-olmedo22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-dominguez-olmedo22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ricardo family: Dominguez-Olmedo - given: Amir H family: Karimi - given: Bernhard family: Schölkopf editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5324-5342 id: dominguez-olmedo22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5324 lastpage: 5342 published: 2022-06-28 00:00:00 +0000 - title: 'Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks' abstract: 'Quantized neural networks typically require smaller memory footprints and lower computation complexity, which is crucial for efficient deployment. However, quantization inevitably leads to a distribution divergence from the original network, which generally degrades the performance. To tackle this issue, massive efforts have been made, but most existing approaches lack statistical considerations and depend on several manual configurations. In this paper, we present an adaptive-mapping quantization method to learn an optimal latent sub-distribution that is inherent within models and smoothly approximated with a concrete Gaussian Mixture (GM). In particular, the network weights are projected in compliance with the GM-approximated sub-distribution. This sub-distribution evolves along with the weight update in a co-tuning schema guided by the direct task-objective optimization. Sufficient experiments on image classification and object detection over various modern architectures demonstrate the effectiveness, generalization property, and transferability of the proposed method. Besides, an efficient deployment flow for the mobile CPU is developed, achieving up to 7.46$\times$ inference acceleration on an octa-core ARM CPU. Our codes have been publicly released at https://github.com/RunpeiDong/DGMS.' volume: 162 URL: https://proceedings.mlr.press/v162/dong22a.html PDF: https://proceedings.mlr.press/v162/dong22a/dong22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-dong22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Runpei family: Dong - given: Zhanhong family: Tan - given: Mengdi family: Wu - given: Linfeng family: Zhang - given: Kaisheng family: Ma editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5343-5359 id: dong22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5343 lastpage: 5359 published: 2022-06-28 00:00:00 +0000 - title: 'PACE: A Parallelizable Computation Encoder for Directed Acyclic Graphs' abstract: 'Optimization of directed acyclic graph (DAG) structures has many applications, such as neural architecture search (NAS) and probabilistic graphical model learning. Encoding DAGs into real vectors is a dominant component in most neural-network-based DAG optimization frameworks. Currently, most popular DAG encoders use an asynchronous message passing scheme which sequentially processes nodes according to the dependency between nodes in a DAG. That is, a node must not be processed until all its predecessors are processed. As a result, they are inherently not parallelizable. In this work, we propose a Parallelizable Attention-based Computation structure Encoder (PACE) that processes nodes simultaneously and encodes DAGs in parallel. We demonstrate the superiority of PACE through encoder-dependent optimization subroutines that search the optimal DAG structure based on the learned DAG embeddings. Experiments show that PACE not only improves the effectiveness over previous sequential DAG encoders with a significantly boosted training and inference speed, but also generates smooth latent (DAG encoding) spaces that are beneficial to downstream optimization subroutines.' volume: 162 URL: https://proceedings.mlr.press/v162/dong22b.html PDF: https://proceedings.mlr.press/v162/dong22b/dong22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-dong22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zehao family: Dong - given: Muhan family: Zhang - given: Fuhai family: Li - given: Yixin family: Chen editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5360-5377 id: dong22b issued: date-parts: - 2022 - 6 - 28 firstpage: 5360 lastpage: 5377 published: 2022-06-28 00:00:00 +0000 - title: 'Privacy for Free: How does Dataset Condensation Help Privacy?' abstract: 'To prevent unintentional data leakage, research community has resorted to data generators that can produce differentially private data for model training. However, for the sake of the data privacy, existing solutions suffer from either expensive training cost or poor generalization performance. Therefore, we raise the question whether training efficiency and privacy can be achieved simultaneously. In this work, we for the first time identify that dataset condensation (DC) which is originally designed for improving training efficiency is also a better solution to replace the traditional data generators for private data generation, thus providing privacy for free. To demonstrate the privacy benefit of DC, we build a connection between DC and differential privacy, and theoretically prove on linear feature extractors (and then extended to non-linear feature extractors) that the existence of one sample has limited impact ($O(m/n)$) on the parameter distribution of networks trained on $m$ samples synthesized from $n (n \gg m)$ raw samples by DC. We also empirically validate the visual privacy and membership privacy of DC-synthesized data by launching both the loss-based and the state-of-the-art likelihood-based membership inference attacks. We envision this work as a milestone for data-efficient and privacy-preserving machine learning.' volume: 162 URL: https://proceedings.mlr.press/v162/dong22c.html PDF: https://proceedings.mlr.press/v162/dong22c/dong22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-dong22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tian family: Dong - given: Bo family: Zhao - given: Lingjuan family: Lyu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5378-5396 id: dong22c issued: date-parts: - 2022 - 6 - 28 firstpage: 5378 lastpage: 5396 published: 2022-06-28 00:00:00 +0000 - title: 'Fast rates for noisy interpolation require rethinking the effect of inductive bias' abstract: 'Good generalization performance on high-dimensional data crucially hinges on a simple structure of the ground truth and a corresponding strong inductive bias of the estimator. Even though this intuition is valid for regularized models, in this paper we caution against a strong inductive bias for interpolation in the presence of noise: While a stronger inductive bias encourages a simpler structure that is more aligned with the ground truth, it also increases the detrimental effect of noise. Specifically, for both linear regression and classification with a sparse ground truth, we prove that minimum $\ell_p$-norm and maximum $\ell_p$-margin interpolators achieve fast polynomial rates close to order $1/n$ for $p > 1$ compared to a logarithmic rate for $p = 1$. Finally, we provide preliminary experimental evidence that this trade-off may also play a crucial role in understanding non-linear interpolating models used in practice.' volume: 162 URL: https://proceedings.mlr.press/v162/donhauser22a.html PDF: https://proceedings.mlr.press/v162/donhauser22a/donhauser22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-donhauser22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Konstantin family: Donhauser - given: Nicolò family: Ruggeri - given: Stefan family: Stojanovic - given: Fanny family: Yang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5397-5428 id: donhauser22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5397 lastpage: 5428 published: 2022-06-28 00:00:00 +0000 - title: 'Adapting to Mixing Time in Stochastic Optimization with Markovian Data' abstract: 'We consider stochastic optimization problems where data is drawn from a Markov chain. Existing methods for this setting crucially rely on knowing the mixing time of the chain, which in real-world applications is usually unknown. We propose the first optimization method that does not require the knowledge of the mixing time, yet obtains the optimal asymptotic convergence rate when applied to convex problems. We further show that our approach can be extended to: (i) finding stationary points in non-convex optimization with Markovian data, and (ii) obtaining better dependence on the mixing time in temporal difference (TD) learning; in both cases, our method is completely oblivious to the mixing time. Our method relies on a novel combination of multi-level Monte Carlo (MLMC) gradient estimation together with an adaptive learning method.' volume: 162 URL: https://proceedings.mlr.press/v162/dorfman22a.html PDF: https://proceedings.mlr.press/v162/dorfman22a/dorfman22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-dorfman22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ron family: Dorfman - given: Kfir Yehuda family: Levy editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5429-5446 id: dorfman22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5429 lastpage: 5446 published: 2022-06-28 00:00:00 +0000 - title: 'TACTiS: Transformer-Attentional Copulas for Time Series' abstract: 'The estimation of time-varying quantities is a fundamental component of decision making in fields such as healthcare and finance. However, the practical utility of such estimates is limited by how accurately they quantify predictive uncertainty. In this work, we address the problem of estimating the joint predictive distribution of high-dimensional multivariate time series. We propose a versatile method, based on the transformer architecture, that estimates joint distributions using an attention-based decoder that provably learns to mimic the properties of non-parametric copulas. The resulting model has several desirable properties: it can scale to hundreds of time series, supports both forecasting and interpolation, can handle unaligned and non-uniformly sampled data, and can seamlessly adapt to missing data during training. We demonstrate these properties empirically and show that our model produces state-of-the-art predictions on multiple real-world datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/drouin22a.html PDF: https://proceedings.mlr.press/v162/drouin22a/drouin22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-drouin22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Alexandre family: Drouin - given: Étienne family: Marcotte - given: Nicolas family: Chapados editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5447-5493 id: drouin22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5447 lastpage: 5493 published: 2022-06-28 00:00:00 +0000 - title: 'Branching Reinforcement Learning' abstract: 'In this paper, we propose a novel Branching Reinforcement Learning (Branching RL) model, and investigate both Regret Minimization (RM) and Reward-Free Exploration (RFE) metrics for this model. Unlike standard RL where the trajectory of each episode is a single $H$-step path, branching RL allows an agent to take multiple base actions in a state such that transitions branch out to multiple successor states correspondingly, and thus it generates a tree-structured trajectory. This model finds important applications in hierarchical recommendation systems and online advertising. For branching RL, we establish new Bellman equations and key lemmas, i.e., branching value difference lemma and branching law of total variance, and also bound the total variance by only $O(H^2)$ under an exponentially-large trajectory. For RM and RFE metrics, we propose computationally efficient algorithms BranchVI and BranchRFE, respectively, and derive nearly matching upper and lower bounds. Our regret and sample complexity results are polynomial in all problem parameters despite exponentially-large trajectories.' volume: 162 URL: https://proceedings.mlr.press/v162/du22a.html PDF: https://proceedings.mlr.press/v162/du22a/du22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-du22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yihan family: Du - given: Wei family: Chen editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5494-5530 id: du22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5494 lastpage: 5530 published: 2022-06-28 00:00:00 +0000 - title: 'Bayesian Imitation Learning for End-to-End Mobile Manipulation' abstract: 'In this work we investigate and demonstrate benefits of a Bayesian approach to imitation learning from multiple sensor inputs, as applied to the task of opening office doors with a mobile manipulator. Augmenting policies with additional sensor inputs{—}such as RGB + depth cameras{—}is a straightforward approach to improving robot perception capabilities, especially for tasks that may favor different sensors in different situations. As we scale multi-sensor robotic learning to unstructured real-world settings (e.g. offices, homes) and more complex robot behaviors, we also increase reliance on simulators for cost, efficiency, and safety. Consequently, the sim-to-real gap across multiple sensor modalities also increases, making simulated validation more difficult. We show that using the Variational Information Bottleneck (Alemi et al., 2016) to regularize convolutional neural networks improves generalization to heldout domains and reduces the sim-to-real gap in a sensor-agnostic manner. As a side effect, the learned embeddings also provide useful estimates of model uncertainty for each sensor. We demonstrate that our method is able to help close the sim-to-real gap and successfully fuse RGB and depth modalities based on understanding of the situational uncertainty of each sensor. In a real-world office environment, we achieve 96% task success, improving upon the baseline by +16%.' volume: 162 URL: https://proceedings.mlr.press/v162/du22b.html PDF: https://proceedings.mlr.press/v162/du22b/du22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-du22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yuqing family: Du - given: Daniel family: Ho - given: Alex family: Alemi - given: Eric family: Jang - given: Mohi family: Khansari editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5531-5546 id: du22b issued: date-parts: - 2022 - 6 - 28 firstpage: 5531 lastpage: 5546 published: 2022-06-28 00:00:00 +0000 - title: 'GLaM: Efficient Scaling of Language Models with Mixture-of-Experts' abstract: 'Scaling language models with more data, compute and parameters has driven significant progress in natural language processing. For example, thanks to scaling, GPT-3 was able to achieve strong results on in-context learning tasks. However, training these large dense models requires significant amounts of computing resources. In this paper, we propose and develop a family of language models named \glam (\textbf{G}eneralist \textbf{La}nguage \textbf{M}odel), which uses a sparsely activated mixture-of-experts architecture to scale the model capacity while also incurring substantially less training cost compared to dense variants. The largest \glam has 1.2 trillion parameters, which is approximately 7x larger than GPT-3. It consumes only 1/3 of the energy used to train GPT-3 and requires half of the computation flops for inference, while still achieving better overall fewshot performance across 29 NLP tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/du22c.html PDF: https://proceedings.mlr.press/v162/du22c/du22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-du22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Nan family: Du - given: Yanping family: Huang - given: Andrew M family: Dai - given: Simon family: Tong - given: Dmitry family: Lepikhin - given: Yuanzhong family: Xu - given: Maxim family: Krikun - given: Yanqi family: Zhou - given: Adams Wei family: Yu - given: Orhan family: Firat - given: Barret family: Zoph - given: Liam family: Fedus - given: Maarten P family: Bosma - given: Zongwei family: Zhou - given: Tao family: Wang - given: Emma family: Wang - given: Kellie family: Webster - given: Marie family: Pellat - given: Kevin family: Robinson - given: Kathleen family: Meier-Hellstern - given: Toju family: Duke - given: Lucas family: Dixon - given: Kun family: Zhang - given: Quoc family: Le - given: Yonghui family: Wu - given: Zhifeng family: Chen - given: Claire family: Cui editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5547-5569 id: du22c issued: date-parts: - 2022 - 6 - 28 firstpage: 5547 lastpage: 5569 published: 2022-06-28 00:00:00 +0000 - title: 'Learning Iterative Reasoning through Energy Minimization' abstract: 'Deep learning has excelled on complex pattern recognition tasks such as image classification and object recognition. However, it struggles with tasks requiring nontrivial reasoning, such as algorithmic computation. Humans are able to solve such tasks through iterative reasoning – spending more time to think about harder tasks. Most existing neural networks, however, exhibit a fixed computational budget controlled by the neural network architecture, preventing additional computational processing on harder tasks. In this work, we present a new framework for iterative reasoning with neural networks. We train a neural network to parameterize an energy landscape over all outputs, and implement each step of the iterative reasoning as an energy minimization step to find a minimal energy solution. By formulating reasoning as an energy minimization problem, for harder problems that lead to more complex energy landscapes, we may then adjust our underlying computational budget by running a more complex optimization procedure. We empirically illustrate that our iterative reasoning approach can solve more accurate and generalizable algorithmic reasoning tasks in both graph and continuous domains. Finally, we illustrate that our approach can recursively solve algorithmic problems requiring nested reasoning.' volume: 162 URL: https://proceedings.mlr.press/v162/du22d.html PDF: https://proceedings.mlr.press/v162/du22d/du22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-du22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yilun family: Du - given: Shuang family: Li - given: Joshua family: Tenenbaum - given: Igor family: Mordatch editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5570-5582 id: du22d issued: date-parts: - 2022 - 6 - 28 firstpage: 5570 lastpage: 5582 published: 2022-06-28 00:00:00 +0000 - title: 'SE(3) Equivariant Graph Neural Networks with Complete Local Frames' abstract: 'Group equivariance (e.g. SE(3) equivariance) is a critical physical symmetry in science, from classical and quantum physics to computational biology. It enables robust and accurate prediction under arbitrary reference transformations. In light of this, great efforts have been put on encoding this symmetry into deep neural networks, which has been shown to improve the generalization performance and data efficiency for downstream tasks. Constructing an equivariant neural network generally brings high computational costs to ensure expressiveness. Therefore, how to better trade-off the expressiveness and computational efficiency plays a core role in the design of the equivariant deep learning models. In this paper, we propose a framework to construct SE(3) equivariant graph neural networks that can approximate the geometric quantities efficiently. Inspired by differential geometry and physics, we introduce equivariant local complete frames to graph neural networks, such that tensor information at given orders can be projected onto the frames. The local frame is constructed to form an orthonormal basis that avoids direction degeneration and ensure completeness. Since the frames are built only by cross product operations, our method is computationally efficient. We evaluate our method on two tasks: Newton mechanics modeling and equilibrium molecule conformation generation. Extensive experimental results demonstrate that our model achieves the best or competitive performance in two types of datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/du22e.html PDF: https://proceedings.mlr.press/v162/du22e/du22e.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-du22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Weitao family: Du - given: He family: Zhang - given: Yuanqi family: Du - given: Qi family: Meng - given: Wei family: Chen - given: Nanning family: Zheng - given: Bin family: Shao - given: Tie-Yan family: Liu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5583-5608 id: du22e issued: date-parts: - 2022 - 6 - 28 firstpage: 5583 lastpage: 5608 published: 2022-06-28 00:00:00 +0000 - title: 'A Context-Integrated Transformer-Based Neural Network for Auction Design' abstract: 'One of the central problems in auction design is developing an incentive-compatible mechanism that maximizes the auctioneer’s expected revenue. While theoretical approaches have encountered bottlenecks in multi-item auctions, recently, there has been much progress on finding the optimal mechanism through deep learning. However, these works either focus on a fixed set of bidders and items, or restrict the auction to be symmetric. In this work, we overcome such limitations by factoring public contextual information of bidders and items into the auction learning framework. We propose $\mathtt{CITransNet}$, a context-integrated transformer-based neural network for optimal auction design, which maintains permutation-equivariance over bids and contexts while being able to find asymmetric solutions. We show by extensive experiments that $\mathtt{CITransNet}$ can recover the known optimal solutions in single-item settings, outperform strong baselines in multi-item auctions, and generalize well to cases other than those in training.' volume: 162 URL: https://proceedings.mlr.press/v162/duan22a.html PDF: https://proceedings.mlr.press/v162/duan22a/duan22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-duan22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhijian family: Duan - given: Jingwu family: Tang - given: Yutong family: Yin - given: Zhe family: Feng - given: Xiang family: Yan - given: Manzil family: Zaheer - given: Xiaotie family: Deng editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5609-5626 id: duan22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5609 lastpage: 5626 published: 2022-06-28 00:00:00 +0000 - title: 'Augment with Care: Contrastive Learning for Combinatorial Problems' abstract: 'Supervised learning can improve the design of state-of-the-art solvers for combinatorial problems, but labelling large numbers of combinatorial instances is often impractical due to exponential worst-case complexity. Inspired by the recent success of contrastive pre-training for images, we conduct a scientific study of the effect of augmentation design on contrastive pre-training for the Boolean satisfiability problem. While typical graph contrastive pre-training uses label-agnostic augmentations, our key insight is that many combinatorial problems have well-studied invariances, which allow for the design of label-preserving augmentations. We find that label-preserving augmentations are critical for the success of contrastive pre-training. We show that our representations are able to achieve comparable test accuracy to fully-supervised learning while using only 1% of the labels. We also demonstrate that our representations are more transferable to larger problems from unseen domains. Our code is available at https://github.com/h4duan/contrastive-sat.' volume: 162 URL: https://proceedings.mlr.press/v162/duan22b.html PDF: https://proceedings.mlr.press/v162/duan22b/duan22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-duan22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Haonan family: Duan - given: Pashootan family: Vaezipoor - given: Max B family: Paulus - given: Yangjun family: Ruan - given: Chris family: Maddison editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5627-5642 id: duan22b issued: date-parts: - 2022 - 6 - 28 firstpage: 5627 lastpage: 5642 published: 2022-06-28 00:00:00 +0000 - title: 'Parametric Visual Program Induction with Function Modularization' abstract: 'Generating programs to describe visual observations has gained much research attention recently. However, most of the existing approaches are based on non-parametric primitive functions, making them unable to handle complex visual scenes involving many attributes and details. In this paper, we propose the concept of parametric visual program induction. Learning to generate parametric programs for visual scenes is challenging due to the huge number of function variants and the complex function correlations. To solve these challenges, we propose the method of function modularization, capable of dealing with numerous function variants and complex correlations. Specifically, we model each parametric function as a multi-head self-contained neural module to cover different function variants. Moreover, to eliminate the complex correlations between functions, we propose the hierarchical heterogeneous Monto-Carlo tree search (H2MCTS) algorithm which can provide high-quality uncorrelated supervision during training, and serve as an efficient searching technique during testing. We demonstrate the superiority of the proposed method on three visual program induction datasets involving parametric primitive functions. Experimental results show that our proposed model is able to significantly outperform the state-of-the-art baseline methods in terms of generating accurate programs.' volume: 162 URL: https://proceedings.mlr.press/v162/duan22c.html PDF: https://proceedings.mlr.press/v162/duan22c/duan22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-duan22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xuguang family: Duan - given: Xin family: Wang - given: Ziwei family: Zhang - given: Wenwu family: Zhu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5643-5658 id: duan22c issued: date-parts: - 2022 - 6 - 28 firstpage: 5643 lastpage: 5658 published: 2022-06-28 00:00:00 +0000 - title: 'Bayesian Deep Embedding Topic Meta-Learner' abstract: 'Existing deep topic models are effective in capturing the latent semantic structures in textual data but usually rely on a plethora of documents. This is less than satisfactory in practical applications when only a limited amount of data is available. In this paper, we propose a novel framework that efficiently solves the problem of topic modeling under the small data regime. Specifically, the framework involves two innovations: a bi-level generative model that aims to exploit the task information to guide the document generation, and a topic meta-learner that strives to learn a group of global topic embeddings so that fast adaptation to the task-specific topic embeddings can be achieved with a few examples. We apply the proposed framework to a hierarchical embedded topic model and achieve better performance than various baseline models on diverse experiments, including few-shot topic discovery and few-shot document classification.' volume: 162 URL: https://proceedings.mlr.press/v162/duan22d.html PDF: https://proceedings.mlr.press/v162/duan22d/duan22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-duan22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhibin family: Duan - given: Yishi family: Xu - given: Jianqiao family: Sun - given: Bo family: Chen - given: Wenchao family: Chen - given: Chaojie family: Wang - given: Mingyuan family: Zhou editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5659-5670 id: duan22d issued: date-parts: - 2022 - 6 - 28 firstpage: 5659 lastpage: 5670 published: 2022-06-28 00:00:00 +0000 - title: 'Deletion Robust Submodular Maximization over Matroids' abstract: 'Maximizing a monotone submodular function is a fundamental task in machine learning. In this paper we study the deletion robust version of the problem under the classic matroids constraint. Here the goal is to extract a small size summary of the dataset that contains a high value independent set even after an adversary deleted some elements. We present constant-factor approximation algorithms, whose space complexity depends on the rank $k$ of the matroid and the number $d$ of deleted elements. In the centralized setting we present a $(3.582+O(\varepsilon))$-approximation algorithm with summary size $O(k + \frac{d}{\eps^2}\log \frac{k}{\eps})$. In the streaming setting we provide a $(5.582+O(\varepsilon))$-approximation algorithm with summary size and memory $O(k + \frac{d}{\eps^2}\log \frac{k}{\eps})$. We complement our theoretical results with an in-depth experimental analysis showing the effectiveness of our algorithms on real-world datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/duetting22a.html PDF: https://proceedings.mlr.press/v162/duetting22a/duetting22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-duetting22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Paul family: Duetting - given: Federico family: Fusco - given: Silvio family: Lattanzi - given: Ashkan family: Norouzi-Fard - given: Morteza family: Zadimoghaddam editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5671-5693 id: duetting22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5671 lastpage: 5693 published: 2022-06-28 00:00:00 +0000 - title: 'From data to functa: Your data point is a function and you can treat it like one' abstract: 'It is common practice in deep learning to represent a measurement of the world on a discrete grid, e.g. a 2D grid of pixels. However, the underlying signal represented by these measurements is often continuous, e.g. the scene depicted in an image. A powerful continuous alternative is then to represent these measurements using an implicit neural representation, a neural function trained to output the appropriate measurement value for any input spatial location. In this paper, we take this idea to its next level: what would it take to perform deep learning on these functions instead, treating them as data? In this context we refer to the data as functa, and propose a framework for deep learning on functa. This view presents a number of challenges around efficient conversion from data to functa, compact representation of functa, and effectively solving downstream tasks on functa. We outline a recipe to overcome these challenges and apply it to a wide range of data modalities including images, 3D shapes, neural radiance fields (NeRF) and data on manifolds. We demonstrate that this approach has various compelling properties across data modalities, in particular on the canonical tasks of generative modeling, data imputation, novel view synthesis and classification.' volume: 162 URL: https://proceedings.mlr.press/v162/dupont22a.html PDF: https://proceedings.mlr.press/v162/dupont22a/dupont22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-dupont22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Emilien family: Dupont - given: Hyunjik family: Kim - given: S. M. Ali family: Eslami - given: Danilo Jimenez family: Rezende - given: Dan family: Rosenbaum editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5694-5725 id: dupont22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5694 lastpage: 5725 published: 2022-06-28 00:00:00 +0000 - title: 'Efficient Low Rank Convex Bounds for Pairwise Discrete Graphical Models' abstract: 'In this paper, we extend a Burer-Monteiro style method to compute low rank Semi-Definite Programming (SDP) bounds for the MAP problem on discrete graphical models with an arbitrary number of states and arbitrary pairwise potentials. We consider both a penalized constraint approach and a dedicated Block Coordinate Descent (BCD) approach which avoids large penalty coefficients in the cost matrix. We show our algorithm is decreasing. Experiments show that the BCD approach compares favorably to the penalized approach and to usual linear bounds relying on convergent message passing approaches.' volume: 162 URL: https://proceedings.mlr.press/v162/durante22a.html PDF: https://proceedings.mlr.press/v162/durante22a/durante22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-durante22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Valentin family: Durante - given: George family: Katsirelos - given: Thomas family: Schiex editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5726-5741 id: durante22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5726 lastpage: 5741 published: 2022-06-28 00:00:00 +0000 - title: 'Robust Counterfactual Explanations for Tree-Based Ensembles' abstract: 'Counterfactual explanations inform ways to achieve a desired outcome from a machine learning model. However, such explanations are not robust to certain real-world changes in the underlying model (e.g., retraining the model, changing hyperparameters, etc.), questioning their reliability in several applications, e.g., credit lending. In this work, we propose a novel strategy - that we call RobX - to generate robust counterfactuals for tree-based ensembles, e.g., XGBoost. Tree-based ensembles pose additional challenges in robust counterfactual generation, e.g., they have a non-smooth and non-differentiable objective function, and they can change a lot in the parameter space under retraining on very similar data. We first introduce a novel metric - that we call Counterfactual Stability - that attempts to quantify how robust a counterfactual is going to be to model changes under retraining, and comes with desirable theoretical properties. Our proposed strategy RobX works with any counterfactual generation method (base method) and searches for robust counterfactuals by iteratively refining the counterfactual generated by the base method using our metric Counterfactual Stability. We compare the performance of RobX with popular counterfactual generation methods (for tree-based ensembles) across benchmark datasets. The results demonstrate that our strategy generates counterfactuals that are significantly more robust (nearly 100% validity after actual model changes) and also realistic (in terms of local outlier factor) over existing state-of-the-art methods.' volume: 162 URL: https://proceedings.mlr.press/v162/dutta22a.html PDF: https://proceedings.mlr.press/v162/dutta22a/dutta22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-dutta22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sanghamitra family: Dutta - given: Jason family: Long - given: Saumitra family: Mishra - given: Cecilia family: Tilli - given: Daniele family: Magazzeni editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5742-5756 id: dutta22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5742 lastpage: 5756 published: 2022-06-28 00:00:00 +0000 - title: 'On the Difficulty of Defending Self-Supervised Learning against Model Extraction' abstract: 'Self-Supervised Learning (SSL) is an increasingly popular ML paradigm that trains models to transform complex inputs into representations without relying on explicit labels. These representations encode similarity structures that enable efficient learning of multiple downstream tasks. Recently, ML-as-a-Service providers have commenced offering trained SSL models over inference APIs, which transform user inputs into useful representations for a fee. However, the high cost involved to train these models and their exposure over APIs both make black-box extraction a realistic security threat. We thus explore model stealing attacks against SSL. Unlike traditional model extraction on classifiers that output labels, the victim models here output representations; these representations are of significantly higher dimensionality compared to the low-dimensional prediction scores output by classifiers. We construct several novel attacks and find that approaches that train directly on a victim’s stolen representations are query efficient and enable high accuracy for downstream models. We then show that existing defenses against model extraction are inadequate and not easily retrofitted to the specificities of SSL.' volume: 162 URL: https://proceedings.mlr.press/v162/dziedzic22a.html PDF: https://proceedings.mlr.press/v162/dziedzic22a/dziedzic22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-dziedzic22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Adam family: Dziedzic - given: Nikita family: Dhawan - given: Muhammad Ahmad family: Kaleem - given: Jonas family: Guan - given: Nicolas family: Papernot editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5757-5776 id: dziedzic22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5757 lastpage: 5776 published: 2022-06-28 00:00:00 +0000 - title: 'LIMO: Latent Inceptionism for Targeted Molecule Generation' abstract: 'Generation of drug-like molecules with high binding affinity to target proteins remains a difficult and resource-intensive task in drug discovery. Existing approaches primarily employ reinforcement learning, Markov sampling, or deep generative models guided by Gaussian processes, which can be prohibitively slow when generating molecules with high binding affinity calculated by computationally-expensive physics-based methods. We present Latent Inceptionism on Molecules (LIMO), which significantly accelerates molecule generation with an inceptionism-like technique. LIMO employs a variational autoencoder-generated latent space and property prediction by two neural networks in sequence to enable faster gradient-based reverse-optimization of molecular properties. Comprehensive experiments show that LIMO performs competitively on benchmark tasks and markedly outperforms state-of-the-art techniques on the novel task of generating drug-like compounds with high binding affinity, reaching nanomolar range against two protein targets. We corroborate these docking-based results with more accurate molecular dynamics-based calculations of absolute binding free energy and show that one of our generated drug-like compounds has a predicted $K_D$ (a measure of binding affinity) of $6 \cdot 10^{-14}$ M against the human estrogen receptor, well beyond the affinities of typical early-stage drug candidates and most FDA-approved drugs to their respective targets. Code is available at https://github.com/Rose-STL-Lab/LIMO.' volume: 162 URL: https://proceedings.mlr.press/v162/eckmann22a.html PDF: https://proceedings.mlr.press/v162/eckmann22a/eckmann22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-eckmann22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Peter family: Eckmann - given: Kunyang family: Sun - given: Bo family: Zhao - given: Mudong family: Feng - given: Michael family: Gilson - given: Rose family: Yu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5777-5792 id: eckmann22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5777 lastpage: 5792 published: 2022-06-28 00:00:00 +0000 - title: 'Inductive Biases and Variable Creation in Self-Attention Mechanisms' abstract: 'Self-attention, an architectural motif designed to model long-range interactions in sequential data, has driven numerous recent breakthroughs in natural language processing and beyond. This work provides a theoretical analysis of the inductive biases of self-attention modules. Our focus is to rigorously establish which functions and long-range dependencies self-attention blocks prefer to represent. Our main result shows that bounded-norm Transformer networks "create sparse variables": a single self-attention head can represent a sparse function of the input sequence, with sample complexity scaling only logarithmically with the context length. To support our analysis, we present synthetic experiments to probe the sample complexity of learning sparse Boolean functions with Transformers.' volume: 162 URL: https://proceedings.mlr.press/v162/edelman22a.html PDF: https://proceedings.mlr.press/v162/edelman22a/edelman22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-edelman22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Benjamin L family: Edelman - given: Surbhi family: Goel - given: Sham family: Kakade - given: Cyril family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5793-5831 id: edelman22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5793 lastpage: 5831 published: 2022-06-28 00:00:00 +0000 - title: 'Provable Reinforcement Learning with a Short-Term Memory' abstract: 'Real-world sequential decision making problems commonly involve partial observability, which requires the agent to maintain a memory of history in order to infer the latent states, plan and make good decisions. Coping with partial observability in general is extremely challenging, as a number of worst-case statistical and computational barriers are known in learning Partially Observable Markov Decision Processes (POMDPs). Motivated by the problem structure in several physical applications, as well as a commonly used technique known as "frame stacking", this paper proposes to study a new subclass of POMDPs, whose latent states can be decoded by the most recent history of a short length m. We establish a set of upper and lower bounds on the sample complexity for learning near-optimal policies for this class of problems in both tabular and rich-observation settings (where the number of observations is enormous). In particular, in the rich-observation setting, we develop new algorithms using a novel "moment matching" approach with a sample complexity that scales exponentially with the short length m rather than the problem horizon, and is independent of the number of observations. Our results show that a short-term memory suffices for reinforcement learning in these environments.' volume: 162 URL: https://proceedings.mlr.press/v162/efroni22a.html PDF: https://proceedings.mlr.press/v162/efroni22a/efroni22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-efroni22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yonathan family: Efroni - given: Chi family: Jin - given: Akshay family: Krishnamurthy - given: Sobhan family: Miryoosefi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5832-5850 id: efroni22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5832 lastpage: 5850 published: 2022-06-28 00:00:00 +0000 - title: 'Sparsity in Partially Controllable Linear Systems' abstract: 'A fundamental concept in control theory is that of controllability, where any system state can be reached through an appropriate choice of control inputs. Indeed, a large body of classical and modern approaches are designed for controllable linear dynamical systems. However, in practice, we often encounter systems in which a large set of state variables evolve exogenously and independently of the control inputs; such systems are only partially controllable. The focus of this work is on a large class of partially controllable linear dynamical systems, specified by an underlying sparsity pattern. Our main results establish structural conditions and finite-sample guarantees for learning to control such systems. In particular, our structural results characterize those state variables which are irrelevant for optimal control, an analysis which departs from classical control techniques. Our algorithmic results adapt techniques from high-dimensional statistics{—}specifically soft-thresholding and semiparametric least-squares{—}to exploit the underlying sparsity pattern in order to obtain finite-sample guarantees that significantly improve over those based on certainty-equivalence. We also corroborate these theoretical improvements over certainty-equivalent control through a simulation study.' volume: 162 URL: https://proceedings.mlr.press/v162/efroni22b.html PDF: https://proceedings.mlr.press/v162/efroni22b/efroni22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-efroni22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yonathan family: Efroni - given: Sham family: Kakade - given: Akshay family: Krishnamurthy - given: Cyril family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5851-5860 id: efroni22b issued: date-parts: - 2022 - 6 - 28 firstpage: 5851 lastpage: 5860 published: 2022-06-28 00:00:00 +0000 - title: 'FedNew: A Communication-Efficient and Privacy-Preserving Newton-Type Method for Federated Learning' abstract: 'Newton-type methods are popular in federated learning due to their fast convergence. Still, they suffer from two main issues, namely: low communication efficiency and low privacy due to the requirement of sending Hessian information from clients to parameter server (PS). In this work, we introduced a novel framework called FedNew in which there is no need to transmit Hessian information from clients to PS, hence resolving the bottleneck to improve communication efficiency. In addition, FedNew hides the gradient information and results in a privacy-preserving approach compared to the existing state-of-the-art. The core novel idea in FedNew is to introduce a two level framework, and alternate between updating the inverse Hessian-gradient product using only one alternating direction method of multipliers (ADMM) step and then performing the global model update using Newton’s method. Though only one ADMM pass is used to approximate the inverse Hessian-gradient product at each iteration, we develop a novel theoretical approach to show the converging behavior of FedNew for convex problems. Additionally, a significant reduction in communication overhead is achieved by utilizing stochastic quantization. Numerical results using real datasets show the superiority of FedNew compared to existing methods in terms of communication costs.' volume: 162 URL: https://proceedings.mlr.press/v162/elgabli22a.html PDF: https://proceedings.mlr.press/v162/elgabli22a/elgabli22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-elgabli22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Anis family: Elgabli - given: Chaouki Ben family: Issaid - given: Amrit Singh family: Bedi - given: Ketan family: Rajawat - given: Mehdi family: Bennis - given: Vaneet family: Aggarwal editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5861-5877 id: elgabli22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5861 lastpage: 5877 published: 2022-06-28 00:00:00 +0000 - title: 'pathGCN: Learning General Graph Spatial Operators from Paths' abstract: 'Graph Convolutional Networks (GCNs), similarly to Convolutional Neural Networks (CNNs), are typically based on two main operations - spatial and point-wise convolutions. In the context of GCNs, differently from CNNs, a pre-determined spatial operator based on the graph Laplacian is often chosen, allowing only the point-wise operations to be learnt. However, learning a meaningful spatial operator is critical for developing more expressive GCNs for improved performance. In this paper we propose pathGCN, a novel approach to learn the spatial operator from random paths on the graph. We analyze the convergence of our method and its difference from existing GCNs. Furthermore, we discuss several options of combining our learnt spatial operator with point-wise convolutions. Our extensive experiments on numerous datasets suggest that by properly learning both the spatial and point-wise convolutions, phenomena like over-smoothing can be inherently avoided, and new state-of-the-art performance is achieved.' volume: 162 URL: https://proceedings.mlr.press/v162/eliasof22a.html PDF: https://proceedings.mlr.press/v162/eliasof22a/eliasof22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-eliasof22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Moshe family: Eliasof - given: Eldad family: Haber - given: Eran family: Treister editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5878-5891 id: eliasof22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5878 lastpage: 5891 published: 2022-06-28 00:00:00 +0000 - title: 'Discrete Tree Flows via Tree-Structured Permutations' abstract: 'While normalizing flows for continuous data have been extensively researched, flows for discrete data have only recently been explored. These prior models, however, suffer from limitations that are distinct from those of continuous flows. Most notably, discrete flow-based models cannot be straightforwardly optimized with conventional deep learning methods because gradients of discrete functions are undefined or zero. Previous works approximate pseudo-gradients of the discrete functions but do not solve the problem on a fundamental level. In addition to that, backpropagation can be computationally burdensome compared to alternative discrete algorithms such as decision tree algorithms. Our approach seeks to reduce computational burden and remove the need for pseudo-gradients by developing a discrete flow based on decision trees—building upon the success of efficient tree-based methods for classification and regression for discrete data. We first define a tree-structured permutation (TSP) that compactly encodes a permutation of discrete data where the inverse is easy to compute; thus, we can efficiently compute the density value and sample new data. We then propose a decision tree algorithm to build TSPs that learns the tree structure and permutations at each node via novel criteria. We empirically demonstrate the feasibility of our method on multiple datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/elkady22a.html PDF: https://proceedings.mlr.press/v162/elkady22a/elkady22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-elkady22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mai family: Elkady - given: Hyung Zin family: Lim - given: David I family: Inouye editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5892-5923 id: elkady22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5892 lastpage: 5923 published: 2022-06-28 00:00:00 +0000 - title: 'For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria' abstract: 'Although it has been known since the 1970s that a globally optimal strategy profile in a common-payoff game is a Nash equilibrium, global optimality is a strict requirement that limits the result’s applicability. In this work, we show that any locally optimal symmetric strategy profile is also a (global) Nash equilibrium. Furthermore, we show that this result is robust to perturbations to the common payoff and to the local optimum. Applied to machine learning, our result provides a global guarantee for any gradient method that finds a local optimum in symmetric strategy space. While this result indicates stability to unilateral deviation, we nevertheless identify broad classes of games where mixed local optima are unstable under joint, asymmetric deviations. We analyze the prevalence of instability by running learning algorithms in a suite of symmetric games, and we conclude by discussing the applicability of our results to multi-agent RL, cooperative inverse RL, and decentralized POMDPs.' volume: 162 URL: https://proceedings.mlr.press/v162/emmons22a.html PDF: https://proceedings.mlr.press/v162/emmons22a/emmons22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-emmons22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Scott family: Emmons - given: Caspar family: Oesterheld - given: Andrew family: Critch - given: Vincent family: Conitzer - given: Stuart family: Russell editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5924-5943 id: emmons22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5924 lastpage: 5943 published: 2022-06-28 00:00:00 +0000 - title: 'Streaming Algorithm for Monotone k-Submodular Maximization with Cardinality Constraints' abstract: 'Maximizing a monotone k-submodular function subject to cardinality constraints is a general model for several applications ranging from influence maximization with multiple products to sensor placement with multiple sensor types and online ad allocation. Due to the large problem scale in many applications and the online nature of ad allocation, a need arises for algorithms that process elements in a streaming fashion and possibly make online decisions. In this work, we develop a new streaming algorithm for maximizing a monotone k-submodular function subject to a per-coordinate cardinality constraint attaining an approximation guarantee close to the state of the art guarantee in the offline setting. Though not typical for streaming algorithms, our streaming algorithm also readily applies to the online setting with free disposal. Our algorithm is combinatorial and enjoys fast running time and small number of function evaluations. Furthermore, its guarantee improves as the cardinality constraints get larger, which is especially suited for the large scale applications. For the special case of maximizing a submodular function with large budgets, our combinatorial algorithm matches the guarantee of the state-of-the-art continuous algorithm, which requires significantly more time and function evaluations.' volume: 162 URL: https://proceedings.mlr.press/v162/ene22a.html PDF: https://proceedings.mlr.press/v162/ene22a/ene22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ene22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Alina family: Ene - given: Huy family: Nguyen editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5944-5967 id: ene22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5944 lastpage: 5967 published: 2022-06-28 00:00:00 +0000 - title: 'Towards Scaling Difference Target Propagation by Learning Backprop Targets' abstract: 'The development of biologically-plausible learning algorithms is important for understanding learning in the brain, but most of them fail to scale-up to real-world tasks, limiting their potential as explanations for learning by real brains. As such, it is important to explore learning algorithms that come with strong theoretical guarantees and can match the performance of backpropagation (BP) on complex tasks. One such algorithm is Difference Target Propagation (DTP), a biologically-plausible learning algorithm whose close relation with Gauss-Newton (GN) optimization has been recently established. However, the conditions under which this connection rigorously holds preclude layer-wise training of the feedback pathway synaptic weights (which is more biologically plausible). Moreover, good alignment between DTP weight updates and loss gradients is only loosely guaranteed and under very specific conditions for the architecture being trained. In this paper, we propose a novel feedback weight training scheme that ensures both that DTP approximates BP and that layer-wise feedback weight training can be restored without sacrificing any theoretical guarantees. Our theory is corroborated by experimental results and we report the best performance ever achieved by DTP on CIFAR-10 and ImageNet 32x32.' volume: 162 URL: https://proceedings.mlr.press/v162/ernoult22a.html PDF: https://proceedings.mlr.press/v162/ernoult22a/ernoult22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ernoult22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Maxence M family: Ernoult - given: Fabrice family: Normandin - given: Abhinav family: Moudgil - given: Sean family: Spinney - given: Eugene family: Belilovsky - given: Irina family: Rish - given: Blake family: Richards - given: Yoshua family: Bengio editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5968-5987 id: ernoult22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5968 lastpage: 5987 published: 2022-06-28 00:00:00 +0000 - title: 'Understanding Dataset Difficulty with $\mathcal{V}$-Usable Information' abstract: 'Estimating the difficulty of a dataset typically involves comparing state-of-the-art models to humans; the bigger the performance gap, the harder the dataset is said to be. However, this comparison provides little understanding of how difficult each instance in a given distribution is, or what attributes make the dataset difficult for a given model. To address these questions, we frame dataset difficulty—w.r.t. a model $\mathcal{V}$—as the lack of $\mathcal{V}$-usable information (Xu et al., 2019), where a lower value indicates a more difficult dataset for $\mathcal{V}$. We further introduce pointwise $\mathcal{V}$-information (PVI) for measuring the difficulty of individual instances w.r.t. a given distribution. While standard evaluation metrics typically only compare different models for the same dataset, $\mathcal{V}$-usable information and PVI also permit the converse: for a given model $\mathcal{V}$, we can compare different datasets, as well as different instances/slices of the same dataset. Furthermore, our framework allows for the interpretability of different input attributes via transformations of the input, which we use to discover annotation artefacts in widely-used NLP benchmarks.' volume: 162 URL: https://proceedings.mlr.press/v162/ethayarajh22a.html PDF: https://proceedings.mlr.press/v162/ethayarajh22a/ethayarajh22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ethayarajh22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Kawin family: Ethayarajh - given: Yejin family: Choi - given: Swabha family: Swayamdipta editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 5988-6008 id: ethayarajh22a issued: date-parts: - 2022 - 6 - 28 firstpage: 5988 lastpage: 6008 published: 2022-06-28 00:00:00 +0000 - title: 'Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning' abstract: 'Transfer-learning methods aim to improve performance in a data-scarce target domain using a model pretrained on a data-rich source domain. A cost-efficient strategy, linear probing, involves freezing the source model and training a new classification head for the target domain. This strategy is outperformed by a more costly but state-of-the-art method – fine-tuning all parameters of the source model to the target domain – possibly because fine-tuning allows the model to leverage useful information from intermediate layers which is otherwise discarded by the later previously trained layers. We explore the hypothesis that these intermediate layers might be directly exploited. We propose a method, Head-to-Toe probing (Head2Toe), that selects features from all layers of the source model to train a classification head for the target-domain. In evaluations on the Visual Task Adaptation Benchmark-1k, Head2Toe matches performance obtained with fine-tuning on average while reducing training and storage cost hundred folds or more, but critically, for out-of-distribution transfer, Head2Toe outperforms fine-tuning. Code used in our experiments can be found in supplementary materials.' volume: 162 URL: https://proceedings.mlr.press/v162/evci22a.html PDF: https://proceedings.mlr.press/v162/evci22a/evci22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-evci22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Utku family: Evci - given: Vincent family: Dumoulin - given: Hugo family: Larochelle - given: Michael C family: Mozer editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6009-6033 id: evci22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6009 lastpage: 6033 published: 2022-06-28 00:00:00 +0000 - title: 'Variational Sparse Coding with Learned Thresholding' abstract: 'Sparse coding strategies have been lauded for their parsimonious representations of data that leverage low dimensional structure. However, inference of these codes typically relies on an optimization procedure with poor computational scaling in high-dimensional problems. For example, sparse inference in the representations learned in the high-dimensional intermediary layers of deep neural networks (DNNs) requires an iterative minimization to be performed at each training step. As such, recent, quick methods in variational inference have been proposed to infer sparse codes by learning a distribution over the codes with a DNN. In this work, we propose a new approach to variational sparse coding that allows us to learn sparse distributions by thresholding samples, avoiding the use of problematic relaxations. We first evaluate and analyze our method by training a linear generator, showing that it has superior performance, statistical efficiency, and gradient estimation compared to other sparse distributions. We then compare to a standard variational autoencoder using a DNN generator on the CelebA dataset.' volume: 162 URL: https://proceedings.mlr.press/v162/fallah22a.html PDF: https://proceedings.mlr.press/v162/fallah22a/fallah22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-fallah22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Kion family: Fallah - given: Christopher J family: Rozell editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6034-6058 id: fallah22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6034 lastpage: 6058 published: 2022-06-28 00:00:00 +0000 - title: 'Training Discrete Deep Generative Models via Gapped Straight-Through Estimator' abstract: 'While deep generative models have succeeded in image processing, natural language processing, and reinforcement learning, training that involves discrete random variables remains challenging due to the high variance of its gradient estimation process. Monte Carlo is a common solution used in most variance reduction approaches. However, this involves time-consuming resampling and multiple function evaluations. We propose a Gapped Straight-Through (GST) estimator to reduce the variance without incurring resampling overhead. This estimator is inspired by the essential properties of Straight-Through Gumbel-Softmax. We determine these properties and show via an ablation study that they are essential. Experiments demonstrate that the proposed GST estimator enjoys better performance compared to strong baselines on two discrete deep generative modeling tasks, MNIST-VAE and ListOps.' volume: 162 URL: https://proceedings.mlr.press/v162/fan22a.html PDF: https://proceedings.mlr.press/v162/fan22a/fan22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-fan22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ting-Han family: Fan - given: Ta-Chung family: Chi - given: Alexander I. family: Rudnicky - given: Peter J family: Ramadge editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6059-6073 id: fan22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6059 lastpage: 6073 published: 2022-06-28 00:00:00 +0000 - title: 'DRIBO: Robust Deep Reinforcement Learning via Multi-View Information Bottleneck' abstract: 'Deep reinforcement learning (DRL) agents are often sensitive to visual changes that were unseen in their training environments. To address this problem, we leverage the sequential nature of RL to learn robust representations that encode only task-relevant information from observations based on the unsupervised multi-view setting. Specif- ically, we introduce a novel contrastive version of the Multi-View Information Bottleneck (MIB) objective for temporal data. We train RL agents from pixels with this auxiliary objective to learn robust representations that can compress away task-irrelevant information and are predictive of task-relevant dynamics. This approach enables us to train high-performance policies that are robust to visual distractions and can generalize well to unseen environments. We demonstrate that our approach can achieve SOTA performance on a di- verse set of visual control tasks in the DeepMind Control Suite when the background is replaced with natural videos. In addition, we show that our approach outperforms well-established base- lines for generalization to unseen environments on the Procgen benchmark. Our code is open- sourced and available at https://github. com/BU-DEPEND-Lab/DRIBO.' volume: 162 URL: https://proceedings.mlr.press/v162/fan22b.html PDF: https://proceedings.mlr.press/v162/fan22b/fan22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-fan22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jiameng family: Fan - given: Wenchao family: Li editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6074-6102 id: fan22b issued: date-parts: - 2022 - 6 - 28 firstpage: 6074 lastpage: 6102 published: 2022-06-28 00:00:00 +0000 - title: 'Generalized Data Distribution Iteration' abstract: 'To obtain higher sample efficiency and superior final performance simultaneously has been one of the major challenges for deep reinforcement learning (DRL). Previous work could handle one of these challenges but typically failed to address them concurrently. In this paper, we try to tackle these two challenges simultaneously. To achieve this, we firstly decouple these challenges into two classic RL problems: data richness and exploration-exploitation trade-off. Then, we cast these two problems into the training data distribution optimization problem, namely to obtain desired training data within limited interactions, and address them concurrently via i) explicit modeling and control of the capacity and diversity of behavior policy and ii) more fine-grained and adaptive control of selective/sampling distribution of the behavior policy using a monotonic data distribution optimization. Finally, we integrate this process into Generalized Policy Iteration (GPI) and obtain a more general framework called Generalized Data Distribution Iteration (GDI). We use the GDI framework to introduce operator-based versions of well-known RL methods from DQN to Agent57. Theoretical guarantee of the superiority of GDI compared with GPI is concluded. We also demonstrate our state-of-the-art (SOTA) performance on Arcade Learning Environment (ALE), wherein our algorithm has achieved 9620.98% mean human normalized score (HNS), 1146.39% median HNS, and surpassed 22 human world records using only 200M training frames. Our performance is comparable to Agent57’s while we consume 500 times less data. We argue that there is still a long way to go before obtaining real superhuman agents in ALE.' volume: 162 URL: https://proceedings.mlr.press/v162/fan22c.html PDF: https://proceedings.mlr.press/v162/fan22c/fan22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-fan22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jiajun family: Fan - given: Changnan family: Xiao editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6103-6184 id: fan22c issued: date-parts: - 2022 - 6 - 28 firstpage: 6103 lastpage: 6184 published: 2022-06-28 00:00:00 +0000 - title: 'Variational Wasserstein gradient flow' abstract: 'Wasserstein gradient flow has emerged as a promising approach to solve optimization problems over the space of probability distributions. A recent trend is to use the well-known JKO scheme in combination with input convex neural networks to numerically implement the proximal step. The most challenging step, in this setup, is to evaluate functions involving density explicitly, such as entropy, in terms of samples. This paper builds on the recent works with a slight but crucial difference: we propose to utilize a variational formulation of the objective function formulated as maximization over a parametric class of functions. Theoretically, the proposed variational formulation allows the construction of gradient flows directly for empirical distributions with a well-defined and meaningful objective function. Computationally, this approach replaces the computationally expensive step in existing methods, to handle objective functions involving density, with inner loop updates that only require a small batch of samples and scale well with the dimension. The performance and scalability of the proposed method are illustrated with the aid of several numerical experiments involving high-dimensional synthetic and real datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/fan22d.html PDF: https://proceedings.mlr.press/v162/fan22d/fan22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-fan22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jiaojiao family: Fan - given: Qinsheng family: Zhang - given: Amirhossein family: Taghvaei - given: Yongxin family: Chen editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6185-6215 id: fan22d issued: date-parts: - 2022 - 6 - 28 firstpage: 6185 lastpage: 6215 published: 2022-06-28 00:00:00 +0000 - title: 'Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)' abstract: 'Contrastively trained language-image models such as CLIP, ALIGN, and BASIC have demonstrated unprecedented robustness to multiple challenging natural distribution shifts. Since these language-image models differ from previous training approaches in several ways, an important question is what causes the large robustness gains. We answer this question via a systematic experimental investigation. Concretely, we study five different possible causes for the robustness gains: (i) the training set size, (ii) the training distribution, (iii) language supervision at training time, (iv) language supervision at test time, and (v) the contrastive loss function. Our experiments show that the more diverse training distribution is the main cause for the robustness gains, with the other factors contributing little to no robustness. Beyond our experimental results, we also introduce ImageNet-Captions, a version of ImageNet with original text annotations from Flickr, to enable further controlled experiments of language-image training.' volume: 162 URL: https://proceedings.mlr.press/v162/fang22a.html PDF: https://proceedings.mlr.press/v162/fang22a/fang22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-fang22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Alex family: Fang - given: Gabriel family: Ilharco - given: Mitchell family: Wortsman - given: Yuhao family: Wan - given: Vaishaal family: Shankar - given: Achal family: Dave - given: Ludwig family: Schmidt editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6216-6234 id: fang22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6216 lastpage: 6234 published: 2022-06-28 00:00:00 +0000 - title: 'Bayesian Continuous-Time Tucker Decomposition' abstract: 'Tensor decomposition is a dominant framework for multiway data analysis and prediction. Although practical data often contains timestamps for the observed entries, existing tensor decomposition approaches overlook or under-use this valuable time information. They either drop the timestamps or bin them into crude steps and hence ignore the temporal dynamics within each step or use simple parametric time coefficients. To overcome these limitations, we propose Bayesian Continuous-Time Tucker Decomposition. We model the tensor-core of the classical Tucker decomposition as a time-varying function, and place a Gaussian process prior to flexibly estimate all kinds of temporal dynamics. In this way, our model maintains the interpretability while is flexible enough to capture various complex temporal relationships between the tensor nodes. For efficient and high-quality posterior inference, we use the stochastic differential equation (SDE) representation of temporal GPs to build an equivalent state-space prior, which avoids huge kernel matrix computation and sparse/low-rank approximations. We then use Kalman filtering, RTS smoothing, and conditional moment matching to develop a scalable message passing inference algorithm. We show the advantage of our method in simulation and several real-world applications.' volume: 162 URL: https://proceedings.mlr.press/v162/fang22b.html PDF: https://proceedings.mlr.press/v162/fang22b/fang22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-fang22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Shikai family: Fang - given: Akil family: Narayan - given: Robert family: Kirby - given: Shandian family: Zhe editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6235-6245 id: fang22b issued: date-parts: - 2022 - 6 - 28 firstpage: 6235 lastpage: 6245 published: 2022-06-28 00:00:00 +0000 - title: 'Byzantine Machine Learning Made Easy By Resilient Averaging of Momentums' abstract: 'Byzantine resilience emerged as a prominent topic within the distributed machine learning community. Essentially, the goal is to enhance distributed optimization algorithms, such as distributed SGD, in a way that guarantees convergence despite the presence of some misbehaving (a.k.a., Byzantine) workers. Although a myriad of techniques addressing the problem have been proposed, the field arguably rests on fragile foundations. These techniques are hard to prove correct and rely on assumptions that are (a) quite unrealistic, i.e., often violated in practice, and (b) heterogeneous, i.e., making it difficult to compare approaches. We present RESAM (RESilient Averaging of Momentums), a unified framework that makes it simple to establish optimal Byzantine resilience, relying only on standard machine learning assumptions. Our framework is mainly composed of two operators: resilient averaging at the server and distributed momentum at the workers. We prove a general theorem stating the convergence of distributed SGD under RESAM. Interestingly, demonstrating and comparing the convergence of many existing techniques become direct corollaries of our theorem, without resorting to stringent assumptions. We also present an empirical evaluation of the practical relevance of RESAM.' volume: 162 URL: https://proceedings.mlr.press/v162/farhadkhani22a.html PDF: https://proceedings.mlr.press/v162/farhadkhani22a/farhadkhani22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-farhadkhani22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sadegh family: Farhadkhani - given: Rachid family: Guerraoui - given: Nirupam family: Gupta - given: Rafael family: Pinot - given: John family: Stephan editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6246-6283 id: farhadkhani22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6246 lastpage: 6283 published: 2022-06-28 00:00:00 +0000 - title: 'An Equivalence Between Data Poisoning and Byzantine Gradient Attacks' abstract: 'To study the resilience of distributed learning, the “Byzantine" literature considers a strong threat model where workers can report arbitrary gradients to the parameter server. Whereas this model helped obtain several fundamental results, it has sometimes been considered unrealistic, when the workers are mostly trustworthy machines. In this paper, we show a surprising equivalence between this model and data poisoning, a threat considered much more realistic. More specifically, we prove that every gradient attack can be reduced to data poisoning, in any personalized federated learning system with PAC guarantees (which we show are both desirable and realistic). This equivalence makes it possible to obtain new impossibility results on the resilience of any “robust” learning algorithm to data poisoning in highly heterogeneous applications, as corollaries of existing impossibility theorems on Byzantine machine learning. Moreover, using our equivalence, we derive a practical attack that we show (theoretically and empirically) can be very effective against classical personalized federated learning models.' volume: 162 URL: https://proceedings.mlr.press/v162/farhadkhani22b.html PDF: https://proceedings.mlr.press/v162/farhadkhani22b/farhadkhani22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-farhadkhani22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sadegh family: Farhadkhani - given: Rachid family: Guerraoui - given: Lê Nguyên family: Hoang - given: Oscar family: Villemaud editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6284-6323 id: farhadkhani22b issued: date-parts: - 2022 - 6 - 28 firstpage: 6284 lastpage: 6323 published: 2022-06-28 00:00:00 +0000 - title: 'Investigating Generalization by Controlling Normalized Margin' abstract: 'Weight norm $\|w\|$ and margin $\gamma$ participate in learning theory via the normalized margin $\gamma/\|w\|$. Since standard neural net optimizers do not control normalized margin, it is hard to test whether this quantity causally relates to generalization. This paper designs a series of experimental studies that explicitly control normalized margin and thereby tackle two central questions. First: does normalized margin always have a causal effect on generalization? The paper finds that no—networks can be produced where normalized margin has seemingly no relationship with generalization, counter to the theory of Bartlett et al. (2017). Second: does normalized margin ever have a causal effect on generalization? The paper finds that yes—in a standard training setup, test performance closely tracks normalized margin. The paper suggests a Gaussian process model as a promising explanation for this behavior.' volume: 162 URL: https://proceedings.mlr.press/v162/farhang22a.html PDF: https://proceedings.mlr.press/v162/farhang22a/farhang22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-farhang22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Alexander R family: Farhang - given: Jeremy D family: Bernstein - given: Kushal family: Tirumala - given: Yang family: Liu - given: Yisong family: Yue editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6324-6336 id: farhang22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6324 lastpage: 6336 published: 2022-06-28 00:00:00 +0000 - title: 'Kernelized Multiplicative Weights for 0/1-Polyhedral Games: Bridging the Gap Between Learning in Extensive-Form and Normal-Form Games' abstract: 'While extensive-form games (EFGs) can be converted into normal-form games (NFGs), doing so comes at the cost of an exponential blowup of the strategy space. So, progress on NFGs and EFGs has historically followed separate tracks, with the EFG community often having to catch up with advances (\eg last-iterate convergence and predictive regret bounds) from the larger NFG community. In this paper we show that the Optimistic Multiplicative Weights Update (OMWU) algorithm—the premier learning algorithm for NFGs—can be simulated on the normal-form equivalent of an EFG in linear time per iteration in the game tree size using a kernel trick. The resulting algorithm, Kernelized OMWU (KOMWU), applies more broadly to all convex games whose strategy space is a polytope with 0/1 integral vertices, as long as the kernel can be evaluated efficiently. In the particular case of EFGs, KOMWU closes several standing gaps between NFG and EFG learning, by enabling direct, black-box transfer to EFGs of desirable properties of learning dynamics that were so far known to be achievable only in NFGs. Specifically, KOMWU gives the first algorithm that guarantees at the same time last-iterate convergence, lower dependence on the size of the game tree than all prior algorithms, and $\tilde{\bigOh}(1)$ regret when followed by all players.' volume: 162 URL: https://proceedings.mlr.press/v162/farina22a.html PDF: https://proceedings.mlr.press/v162/farina22a/farina22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-farina22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Gabriele family: Farina - given: Chung-Wei family: Lee - given: Haipeng family: Luo - given: Christian family: Kroer editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6337-6357 id: farina22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6337 lastpage: 6357 published: 2022-06-28 00:00:00 +0000 - title: 'Local Linear Convergence of Douglas-Rachford for Linear Programming: a Probabilistic Analysis' abstract: 'Douglas-Rachford splitting/ADMM (henceforth DRS) is a very popular algorithm for solving convex optimisation problems to low or moderate accuracy, and in particular for solving large-scale linear programs. Despite recent progress, obtaining highly accurate solutions to linear programs with DRS remains elusive. In this paper we analyze the local linear convergence rate $r$ of the DRS method for random linear programs, and give explicit and tight bounds on $r$. We show that $1-r^2$ is typically of the order of $m^{-1}(n-m)^{-1}$, where $n$ is the number of variables and $m$ is the number of constraints. This provides a quantitative explanation for the very slow convergence of DRS/ADMM on random LPs. The proof of our result relies on an established characterisation of the linear rate of convergence as the cosine of the Friedrichs angle between two subspaces associated to the problem. We also show that the cosecant of this angle can be interpreted as a condition number for the LP. The proof of our result relies on a characterization of the linear rate of convergence as the cosine of the Friedrichs angle between two subspaces associated to the problem. We also show that the cosecant of this angle can be interpreted as a condition number for the LP.' volume: 162 URL: https://proceedings.mlr.press/v162/faust22a.html PDF: https://proceedings.mlr.press/v162/faust22a/faust22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-faust22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Oisin family: Faust - given: Hamza family: Fawzi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6358-6372 id: faust22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6358 lastpage: 6372 published: 2022-06-28 00:00:00 +0000 - title: 'Matching Structure for Dual Learning' abstract: 'Many natural language processing (NLP) tasks appear in dual forms, which are generally solved by dual learning technique that models the dualities between the coupled tasks. In this work, we propose to further enhance dual learning with structure matching that explicitly builds structural connections in between. Starting with the dual text$\leftrightarrow$text generation, we perform dually-syntactic structure co-echoing of the region of interest (RoI) between the task pair, together with a syntax cross-reconstruction at the decoding side. We next extend the idea to a text$\leftrightarrow$non-text setup, making alignment between the syntactic-semantic structure. Over 2*14 tasks covering 5 dual learning scenarios, the proposed structure matching method shows its significant effectiveness in enhancing existing dual learning. Our method can retrieve the key RoIs that are highly crucial to the task performance. Besides NLP tasks, it is also revealed that our approach has great potential in facilitating more non-text$\leftrightarrow$non-text scenarios.' volume: 162 URL: https://proceedings.mlr.press/v162/fei22a.html PDF: https://proceedings.mlr.press/v162/fei22a/fei22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-fei22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hao family: Fei - given: Shengqiong family: Wu - given: Yafeng family: Ren - given: Meishan family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6373-6391 id: fei22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6373 lastpage: 6391 published: 2022-06-28 00:00:00 +0000 - title: 'Cascaded Gaps: Towards Logarithmic Regret for Risk-Sensitive Reinforcement Learning' abstract: 'In this paper, we study gap-dependent regret guarantees for risk-sensitive reinforcement learning based on the entropic risk measure. We propose a novel definition of sub-optimality gaps, which we call cascaded gaps, and we discuss their key components that adapt to underlying structures of the problem. Based on the cascaded gaps, we derive non-asymptotic and logarithmic regret bounds for two model-free algorithms under episodic Markov decision processes. We show that, in appropriate settings, these bounds feature exponential improvement over existing ones that are independent of gaps. We also prove gap-dependent lower bounds, which certify the near optimality of the upper bounds.' volume: 162 URL: https://proceedings.mlr.press/v162/fei22b.html PDF: https://proceedings.mlr.press/v162/fei22b/fei22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-fei22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yingjie family: Fei - given: Ruitu family: Xu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6392-6417 id: fei22b issued: date-parts: - 2022 - 6 - 28 firstpage: 6392 lastpage: 6417 published: 2022-06-28 00:00:00 +0000 - title: 'Private frequency estimation via projective geometry' abstract: 'In this work, we propose a new algorithm ProjectiveGeometryResponse (PGR) for locally differentially private (LDP) frequency estimation. For universe size of k and with n users, our eps-LDP algorithm has communication cost ceil(log_2 k) and computation cost O(n + k\exp(eps) log k) for the server to approximately reconstruct the frequency histogram, while achieve optimal privacy-utility tradeoff. In many practical settings this is a significant improvement over the O (n+k^2) computation cost that is achieved by the recent PI-RAPPOR algorithm (Feldman and Talwar; 2021). Our empirical evaluation shows a speedup of over 50x over PI-RAPPOR while using approximately 75x less memory. In addition, the running time of our algorithm is comparable to that of HadamardResponse (Acharya, Sun, and Zhang; 2019) and RecursiveHadamardResponse (Chen, Kairouz, and Ozgur; 2020) which have significantly worse reconstruction error. The error of our algorithm essentially matches that of the communication- and time-inefficient but utility-optimal SubsetSelection (SS) algorithm (Ye and Barg; 2017). Our new algorithm is based on using Projective Planes over a finite field to define a small collection of sets that are close to being pairwise independent and a dynamic programming algorithm for approximate histogram reconstruction for the server.' volume: 162 URL: https://proceedings.mlr.press/v162/feldman22a.html PDF: https://proceedings.mlr.press/v162/feldman22a/feldman22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-feldman22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Vitaly family: Feldman - given: Jelani family: Nelson - given: Huy family: Nguyen - given: Kunal family: Talwar editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6418-6433 id: feldman22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6418 lastpage: 6433 published: 2022-06-28 00:00:00 +0000 - title: 'An Intriguing Property of Geophysics Inversion' abstract: 'Inversion techniques are widely used to reconstruct subsurface physical properties (e.g., velocity, conductivity) from surface-based geophysical measurements (e.g., seismic, electric/magnetic (EM) data). The problems are governed by partial differential equations (PDEs) like the wave or Maxwell’s equations. Solving geophysical inversion problems is challenging due to the ill-posedness and high computational cost. To alleviate those issues, recent studies leverage deep neural networks to learn the inversion mappings from measurements to the property directly. In this paper, we show that such a mapping can be well modeled by a very shallow (but not wide) network with only five layers. This is achieved based on our new finding of an intriguing property: a near-linear relationship between the input and output, after applying integral transform in high dimensional space. In particular, when dealing with the inversion from seismic data to subsurface velocity governed by a wave equation, the integral results of velocity with Gaussian kernels are linearly correlated to the integral of seismic data with sine kernels. Furthermore, this property can be easily turned into a light-weight encoder-decoder network for inversion. The encoder contains the integration of seismic data and the linear transformation without need for fine-tuning. The decoder only consists of a single transformer block to reverse the integral of velocity. Experiments show that this interesting property holds for two geophysics inversion problems over four different datasets. Compared to much deeper InversionNet, our method achieves comparable accuracy, but consumes significantly fewer parameters' volume: 162 URL: https://proceedings.mlr.press/v162/feng22a.html PDF: https://proceedings.mlr.press/v162/feng22a/feng22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-feng22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yinan family: Feng - given: Yinpeng family: Chen - given: Shihang family: Feng - given: Peng family: Jin - given: Zicheng family: Liu - given: Youzuo family: Lin editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6434-6446 id: feng22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6434 lastpage: 6446 published: 2022-06-28 00:00:00 +0000 - title: 'Principled Knowledge Extrapolation with GANs' abstract: 'Human can extrapolate well, generalize daily knowledge into unseen scenarios, raise and answer counterfactual questions. To imitate this ability via generative models, previous works have extensively studied explicitly encoding Structural Causal Models (SCMs) into architectures of generator networks. This methodology, however, limits the flexibility of the generator as they must be carefully crafted to follow the causal graph, and demands a ground truth SCM with strong ignorability assumption as prior, which is a nontrivial assumption in many real scenarios. Thus, many current causal GAN methods fail to generate high fidelity counterfactual results as they cannot easily leverage state-of-the-art generative models. In this paper, we propose to study counterfactual synthesis from a new perspective of knowledge extrapolation, where a given knowledge dimension of the data distribution is extrapolated, but the remaining knowledge is kept indistinguishable from the original distribution. We show that an adversarial game with a closed-form discriminator can be used to address the knowledge extrapolation problem, and a novel principal knowledge descent method can efficiently estimate the extrapolated distribution through the adversarial game. Our method enjoys both elegant theoretical guarantees and superior performance in many scenarios.' volume: 162 URL: https://proceedings.mlr.press/v162/feng22b.html PDF: https://proceedings.mlr.press/v162/feng22b/feng22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-feng22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ruili family: Feng - given: Jie family: Xiao - given: Kecheng family: Zheng - given: Deli family: Zhao - given: Jingren family: Zhou - given: Qibin family: Sun - given: Zheng-Jun family: Zha editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6447-6464 id: feng22b issued: date-parts: - 2022 - 6 - 28 firstpage: 6447 lastpage: 6464 published: 2022-06-28 00:00:00 +0000 - title: 'A Resilient Distributed Boosting Algorithm' abstract: 'Given a learning task where the data is distributed among several parties, communication is one of the fundamental resources which the parties would like to minimize. We present a distributed boosting algorithm which is resilient to a limited amount of noise. Our algorithm is similar to classical boosting algorithms, although it is equipped with a new component, inspired by Impagliazzo’s hard-core lemma (Impagliazzo, 1995), adding a robustness quality to the algorithm. We also complement this result by showing that resilience to any asymptotically larger noise is not achievable by a communication-efficient algorithm.' volume: 162 URL: https://proceedings.mlr.press/v162/filmus22a.html PDF: https://proceedings.mlr.press/v162/filmus22a/filmus22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-filmus22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yuval family: Filmus - given: Idan family: Mehalel - given: Shay family: Moran editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6465-6473 id: filmus22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6465 lastpage: 6473 published: 2022-06-28 00:00:00 +0000 - title: 'Model-Value Inconsistency as a Signal for Epistemic Uncertainty' abstract: 'Using a model of the environment and a value function, an agent can construct many estimates of a state’s value, by unrolling the model for different lengths and bootstrapping with its value function. Our key insight is that one can treat this set of value estimates as a type of ensemble, which we call an implicit value ensemble (IVE). Consequently, the discrepancy between these estimates can be used as a proxy for the agent’s epistemic uncertainty; we term this signal model-value inconsistency or self-inconsistency for short. Unlike prior work which estimates uncertainty by training an ensemble of many models and/or value functions, this approach requires only the single model and value function which are already being learned in most model-based reinforcement learning algorithms. We provide empirical evidence in both tabular and function approximation settings from pixels that self-inconsistency is useful (i) as a signal for exploration, (ii) for acting safely under distribution shifts, and (iii) for robustifying value-based planning with a learned model.' volume: 162 URL: https://proceedings.mlr.press/v162/filos22a.html PDF: https://proceedings.mlr.press/v162/filos22a/filos22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-filos22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Angelos family: Filos - given: Eszter family: Vértes - given: Zita family: Marinho - given: Gregory family: Farquhar - given: Diana family: Borsa - given: Abram family: Friesen - given: Feryal family: Behbahani - given: Tom family: Schaul - given: Andre family: Barreto - given: Simon family: Osindero editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6474-6498 id: filos22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6474 lastpage: 6498 published: 2022-06-28 00:00:00 +0000 - title: 'Coordinated Double Machine Learning' abstract: 'Double machine learning is a statistical method for leveraging complex black-box models to construct approximately unbiased treatment effect estimates given observational data with high-dimensional covariates, under the assumption of a partially linear model. The idea is to first fit on a subset of the samples two non-linear predictive models, one for the continuous outcome of interest and one for the observed treatment, and then to estimate a linear coefficient for the treatment using the remaining samples through a simple orthogonalized regression. While this methodology is flexible and can accommodate arbitrary predictive models, typically trained independently of one another, this paper argues that a carefully coordinated learning algorithm for deep neural networks may reduce the estimation bias. The improved empirical performance of the proposed method is demonstrated through numerical experiments on both simulated and real data.' volume: 162 URL: https://proceedings.mlr.press/v162/fingerhut22a.html PDF: https://proceedings.mlr.press/v162/fingerhut22a/fingerhut22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-fingerhut22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Nitai family: Fingerhut - given: Matteo family: Sesia - given: Yaniv family: Romano editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6499-6513 id: fingerhut22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6499 lastpage: 6513 published: 2022-06-28 00:00:00 +0000 - title: 'Conformal Prediction Sets with Limited False Positives' abstract: 'We develop a new approach to multi-label conformal prediction in which we aim to output a precise set of promising prediction candidates with a bounded number of incorrect answers. Standard conformal prediction provides the ability to adapt to model uncertainty by constructing a calibrated candidate set in place of a single prediction, with guarantees that the set contains the correct answer with high probability. In order to obey this coverage property, however, conformal sets can become inundated with noisy candidates—which can render them unhelpful in practice. This is particularly relevant to practical applications where there is a limited budget, and the cost (monetary or otherwise) associated with false positives is non-negligible. We propose to trade coverage for a notion of precision by enforcing that the presence of incorrect candidates in the predicted conformal sets (i.e., the total number of false positives) is bounded according to a user-specified tolerance. Subject to this constraint, our algorithm then optimizes for a generalized notion of set coverage (i.e., the true positive rate) that allows for any number of true answers for a given query (including zero). We demonstrate the effectiveness of this approach across a number of classification tasks in natural language processing, computer vision, and computational chemistry.' volume: 162 URL: https://proceedings.mlr.press/v162/fisch22a.html PDF: https://proceedings.mlr.press/v162/fisch22a/fisch22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-fisch22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Adam family: Fisch - given: Tal family: Schuster - given: Tommi family: Jaakkola - given: Dr.Regina family: Barzilay editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6514-6532 id: fisch22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6514 lastpage: 6532 published: 2022-06-28 00:00:00 +0000 - title: 'Fast Population-Based Reinforcement Learning on a Single Machine' abstract: 'Training populations of agents has demonstrated great promise in Reinforcement Learning for stabilizing training, improving exploration and asymptotic performance, and generating a diverse set of solutions. However, population-based training is often not considered by practitioners as it is perceived to be either prohibitively slow (when implemented sequentially), or computationally expensive (if agents are trained in parallel on independent accelerators). In this work, we compare implementations and revisit previous studies to show that the judicious use of compilation and vectorization allows population-based training to be performed on a single machine with one accelerator with minimal overhead compared to training a single agent. We also show that, when provided with a few accelerators, our protocols extend to large population sizes for applications such as hyperparameter tuning. We hope that this work and the public release of our code will encourage practitioners to use population-based learning techniques more frequently for their research and applications.' volume: 162 URL: https://proceedings.mlr.press/v162/flajolet22a.html PDF: https://proceedings.mlr.press/v162/flajolet22a/flajolet22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-flajolet22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Arthur family: Flajolet - given: Claire Bizon family: Monroc - given: Karim family: Beguir - given: Thomas family: Pierrot editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6533-6547 id: flajolet22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6533 lastpage: 6547 published: 2022-06-28 00:00:00 +0000 - title: 'Fast Relative Entropy Coding with A* coding' abstract: 'Relative entropy coding (REC) algorithms encode a sample from a target distribution Q using a proposal distribution P, such that the expected codelength is O(KL[Q || P]). REC can be seamlessly integrated with existing learned compression models since, unlike entropy coding, it does not assume discrete Q or P, and does not require quantisation. However, general REC algorithms require an intractable $\Omega$(exp(KL[Q || P])) runtime. We introduce AS* and AD* coding, two REC algorithms based on A* sampling. We prove that, for continuous distributions over the reals, if the density ratio is unimodal, AS* has O(D$\infty$[Q || P]) expected runtime, where D$\infty$[Q || P] is the Renyi $\infty$-divergence. We provide experimental evidence that AD* also has O(D$\infty$[Q || P]) expected runtime. We prove that AS* and AD* achieve an expected codelength of O(KL[Q || P]). Further, we introduce DAD*, an approximate algorithm based on AD* which retains its favourable runtime and has bias similar to that of alternative methods. Focusing on VAEs, we propose the IsoKL VAE (IKVAE), which can be used with DAD* to further improve compression efficiency. We evaluate A* coding with (IK)VAEs on MNIST, showing that it can losslessly compress images near the theoretically optimal limit.' volume: 162 URL: https://proceedings.mlr.press/v162/flamich22a.html PDF: https://proceedings.mlr.press/v162/flamich22a/flamich22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-flamich22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Gergely family: Flamich - given: Stratis family: Markou - given: Jose Miguel family: Hernandez-Lobato editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6548-6577 id: flamich22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6548 lastpage: 6577 published: 2022-06-28 00:00:00 +0000 - title: 'Contrastive Mixture of Posteriors for Counterfactual Inference, Data Integration and Fairness' abstract: 'Learning meaningful representations of data that can address challenges such as batch effect correction and counterfactual inference is a central problem in many domains including computational biology. Adopting a Conditional VAE framework, we show that marginal independence between the representation and a condition variable plays a key role in both of these challenges. We propose the Contrastive Mixture of Posteriors (CoMP) method that uses a novel misalignment penalty defined in terms of mixtures of the variational posteriors to enforce this independence in latent space. We show that CoMP has attractive theoretical properties compared to previous approaches, and we prove counterfactual identifiability of CoMP under additional assumptions. We demonstrate state-of-the-art performance on a set of challenging tasks including aligning human tumour samples with cancer cell-lines, predicting transcriptome-level perturbation responses, and batch correction on single-cell RNA sequencing data. We also find parallels to fair representation learning and demonstrate that CoMP is competitive on a common task in the field.' volume: 162 URL: https://proceedings.mlr.press/v162/foster22a.html PDF: https://proceedings.mlr.press/v162/foster22a/foster22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-foster22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Adam family: Foster - given: Arpi family: Vezer - given: Craig A. family: Glastonbury - given: Paidi family: Creed - given: Samer family: Abujudeh - given: Aaron family: Sim editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6578-6621 id: foster22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6578 lastpage: 6621 published: 2022-06-28 00:00:00 +0000 - title: 'Label Ranking through Nonparametric Regression' abstract: 'Label Ranking (LR) corresponds to the problem of learning a hypothesis that maps features to rankings over a finite set of labels. We adopt a nonparametric regression approach to LR and obtain theoretical performance guarantees for this fundamental practical problem. We introduce a generative model for Label Ranking, in noiseless and noisy nonparametric regression settings, and provide sample complexity bounds for learning algorithms in both cases. In the noiseless setting, we study the LR problem with full rankings and provide computationally efficient algorithms using decision trees and random forests in the high-dimensional regime. In the noisy setting, we consider the more general cases of LR with incomplete and partial rankings from a statistical viewpoint and obtain sample complexity bounds using the One-Versus-One approach of multiclass classification. Finally, we complement our theoretical contributions with experiments, aiming to understand how the input regression noise affects the observed output.' volume: 162 URL: https://proceedings.mlr.press/v162/fotakis22a.html PDF: https://proceedings.mlr.press/v162/fotakis22a/fotakis22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-fotakis22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Dimitris family: Fotakis - given: Alkis family: Kalavasis - given: Eleni family: Psaroudaki editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6622-6659 id: fotakis22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6622 lastpage: 6659 published: 2022-06-28 00:00:00 +0000 - title: 'A Neural Tangent Kernel Perspective of GANs' abstract: 'We propose a novel theoretical framework of analysis for Generative Adversarial Networks (GANs). We reveal a fundamental flaw of previous analyses which, by incorrectly modeling GANs’ training scheme, are subject to ill-defined discriminator gradients. We overcome this issue which impedes a principled study of GAN training, solving it within our framework by taking into account the discriminator’s architecture. To this end, we leverage the theory of infinite-width neural networks for the discriminator via its Neural Tangent Kernel. We characterize the trained discriminator for a wide range of losses and establish general differentiability properties of the network. From this, we derive new insights about the convergence of the generated distribution, advancing our understanding of GANs’ training dynamics. We empirically corroborate these results via an analysis toolkit based on our framework, unveiling intuitions that are consistent with GAN practice.' volume: 162 URL: https://proceedings.mlr.press/v162/franceschi22a.html PDF: https://proceedings.mlr.press/v162/franceschi22a/franceschi22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-franceschi22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jean-Yves family: Franceschi - given: Emmanuel family: De Bézenac - given: Ibrahim family: Ayed - given: Mickael family: Chen - given: Sylvain family: Lamprier - given: Patrick family: Gallinari editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6660-6704 id: franceschi22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6660 lastpage: 6704 published: 2022-06-28 00:00:00 +0000 - title: 'Extracting Latent State Representations with Linear Dynamics from Rich Observations' abstract: 'Recently, many reinforcement learning techniques have been shown to have provable guarantees in the simple case of linear dynamics, especially in problems like linear quadratic regulators. However, in practice many tasks require learning a policy from rich, high-dimensional features such as images, which are unlikely to be linear. We consider a setting where there is a hidden linear subspace of the high-dimensional feature space in which the dynamics are linear. We design natural objectives based on forward and inverse dynamics models. We prove that these objectives can be efficiently optimized and their local optimizers extract the hidden linear subspace. We empirically verify our theoretical results with synthetic data and explore the effectiveness of our approach (generalized to nonlinear settings) in simple control tasks with rich observations.' volume: 162 URL: https://proceedings.mlr.press/v162/frandsen22a.html PDF: https://proceedings.mlr.press/v162/frandsen22a/frandsen22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-frandsen22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Abraham family: Frandsen - given: Rong family: Ge - given: Holden family: Lee editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6705-6725 id: frandsen22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6705 lastpage: 6725 published: 2022-06-28 00:00:00 +0000 - title: 'SPDY: Accurate Pruning with Speedup Guarantees' abstract: 'The recent focus on the efficiency of deep neural networks (DNNs) has led to significant work on model compression approaches, of which weight pruning is one of the most popular. At the same time, there is rapidly-growing computational support for efficiently executing the unstructured-sparse models obtained via pruning. Yet, most existing pruning methods minimize just the number of remaining weights, i.e. the size of the model, rather than optimizing for inference time. We address this gap by introducing SPDY, a new compression method which automatically determines layer-wise sparsity targets achieving a desired inference speedup on a given system, while minimizing accuracy loss. SPDY is the composition of two new techniques. The first is an efficient and general dynamic programming algorithm for solving constrained layer-wise compression problems, given a set of layer-wise error scores. The second technique is a local search procedure for automatically determining such scores in an accurate and robust manner. Experiments across popular vision and language models show that SPDY guarantees speedups while recovering higher accuracy relative to existing strategies, both for one-shot and gradual pruning scenarios, and is compatible with most existing pruning approaches. We also extend our approach to the recently-proposed task of pruning with very little data, where we achieve the best known accuracy recovery when pruning to the GPU-supported 2:4 sparsity pattern.' volume: 162 URL: https://proceedings.mlr.press/v162/frantar22a.html PDF: https://proceedings.mlr.press/v162/frantar22a/frantar22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-frantar22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Elias family: Frantar - given: Dan family: Alistarh editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6726-6743 id: frantar22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6726 lastpage: 6743 published: 2022-06-28 00:00:00 +0000 - title: 'Revisiting the Effects of Stochasticity for Hamiltonian Samplers' abstract: 'We revisit the theoretical properties of Hamiltonian stochastic differential equations (SDES) for Bayesian posterior sampling, and we study the two types of errors that arise from numerical SDE simulation: the discretization error and the error due to noisy gradient estimates in the context of data subsampling. Our main result is a novel analysis for the effect of mini-batches through the lens of differential operator splitting, revising previous literature results. The stochastic component of a Hamiltonian SDE is decoupled from the gradient noise, for which we make no normality assumptions. This leads to the identification of a convergence bottleneck: when considering mini-batches, the best achievable error rate is $\mathcal{O}(\eta^2)$, with $\eta$ being the integrator step size. Our theoretical results are supported by an empirical study on a variety of regression and classification tasks for Bayesian neural networks.' volume: 162 URL: https://proceedings.mlr.press/v162/franzese22a.html PDF: https://proceedings.mlr.press/v162/franzese22a/franzese22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-franzese22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Giulio family: Franzese - given: Dimitrios family: Milios - given: Maurizio family: Filippone - given: Pietro family: Michiardi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6744-6778 id: franzese22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6744 lastpage: 6778 published: 2022-06-28 00:00:00 +0000 - title: 'Bregman Neural Networks' abstract: 'We present a framework based on bilevel optimization for learning multilayer, deep data representations. On the one hand, the lower-level problem finds a representation by successively minimizing layer-wise objectives made of the sum of a prescribed regularizer as well as a fidelity term and some linear function both depending on the representation found at the previous layer. On the other hand, the upper-level problem optimizes over the linear functions to yield a linearly separable final representation. We show that, by choosing the fidelity term as the quadratic distance between two successive layer-wise representations, the bilevel problem reduces to the training of a feed-forward neural network. Instead, by elaborating on Bregman distances, we devise a novel neural network architecture additionally involving the inverse of the activation function reminiscent of the skip connection used in ResNets. Numerical experiments suggest that the proposed Bregman variant benefits from better learning properties and more robust prediction performance.' volume: 162 URL: https://proceedings.mlr.press/v162/frecon22a.html PDF: https://proceedings.mlr.press/v162/frecon22a/frecon22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-frecon22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jordan family: Frecon - given: Gilles family: Gasso - given: Massimiliano family: Pontil - given: Saverio family: Salzo editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6779-6792 id: frecon22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6779 lastpage: 6792 published: 2022-06-28 00:00:00 +0000 - title: '(Non-)Convergence Results for Predictive Coding Networks' abstract: 'Predictive coding networks (PCNs) are (un)supervised learning models, coming from neuroscience, that approximate how the brain works. One major open problem around PCNs is their convergence behavior. In this paper, we use dynamical systems theory to formally investigate the convergence of PCNs as they are used in machine learning. Doing so, we put their theory on a firm, rigorous basis, by developing a precise mathematical framework for PCN and show that for sufficiently small weights and initializations, PCNs converge for any input. Thereby, we provide the theoretical assurance that previous implementations, whose convergence was assessed solely by numerical experiments, can indeed capture the correct behavior of PCNs. Outside of the identified regime of small weights and small initializations, we show via a counterexample that PCNs can diverge, countering common beliefs held in the community. This is achieved by identifying a Neimark-Sacker bifurcation in a PCN of small size, which gives rise to an unstable fixed point and an invariant curve around it.' volume: 162 URL: https://proceedings.mlr.press/v162/frieder22a.html PDF: https://proceedings.mlr.press/v162/frieder22a/frieder22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-frieder22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Simon family: Frieder - given: Thomas family: Lukasiewicz editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6793-6810 id: frieder22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6793 lastpage: 6810 published: 2022-06-28 00:00:00 +0000 - title: 'Scaling Structured Inference with Randomization' abstract: 'Deep discrete structured models have seen considerable progress recently, but traditional inference using dynamic programming (DP) typically works with a small number of states (less than hundreds), which severely limits model capacity. At the same time, across machine learning, there is a recent trend of using randomized truncation techniques to accelerate computations involving large sums. Here, we propose a family of randomized dynamic programming (RDP) algorithms for scaling structured models to tens of thousands of latent states. Our method is widely applicable to classical DP-based inference (partition, marginal, reparameterization, entropy) and different graph structures (chains, trees, and more general hypergraphs). It is also compatible with automatic differentiation: it can be integrated with neural networks seamlessly and learned with gradient-based optimizers. Our core technique approximates the sum-product by restricting and reweighting DP on a small subset of nodes, which reduces computation by orders of magnitude. We further achieve low bias and variance via Rao-Blackwellization and importance sampling. Experiments over different graphs demonstrate the accuracy and efficiency of our approach. Furthermore, when using RDP for training a structured variational autoencoder with a scaled inference network, we achieve better test likelihood than baselines and successfully prevent posterior collapse.' volume: 162 URL: https://proceedings.mlr.press/v162/fu22a.html PDF: https://proceedings.mlr.press/v162/fu22a/fu22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-fu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yao family: Fu - given: John family: Cunningham - given: Mirella family: Lapata editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6811-6828 id: fu22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6811 lastpage: 6828 published: 2022-06-28 00:00:00 +0000 - title: 'Greedy when Sure and Conservative when Uncertain about the Opponents' abstract: 'We develop a new approach, named Greedy when Sure and Conservative when Uncertain (GSCU), to competing online against unknown and nonstationary opponents. GSCU improves in four aspects: 1) introduces a novel way of learning opponent policy embeddings offline; 2) trains offline a single best response (conditional additionally on our opponent policy embedding) instead of a finite set of separate best responses against any opponent; 3) computes online a posterior of the current opponent policy embedding, without making the discrete and ineffective decision which type the current opponent belongs to; and 4) selects online between a real-time greedy policy and a fixed conservative policy via an adversarial bandit algorithm, gaining a theoretically better regret than adhering to either. Experimental studies on popular benchmarks demonstrate GSCU’s superiority over the state-of-the-art methods. The code is available online at \url{https://github.com/YeTianJHU/GSCU}.' volume: 162 URL: https://proceedings.mlr.press/v162/fu22b.html PDF: https://proceedings.mlr.press/v162/fu22b/fu22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-fu22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Haobo family: Fu - given: Ye family: Tian - given: Hongxiang family: Yu - given: Weiming family: Liu - given: Shuang family: Wu - given: Jiechao family: Xiong - given: Ying family: Wen - given: Kai family: Li - given: Junliang family: Xing - given: Qiang family: Fu - given: Wei family: Yang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6829-6848 id: fu22b issued: date-parts: - 2022 - 6 - 28 firstpage: 6829 lastpage: 6848 published: 2022-06-28 00:00:00 +0000 - title: 'DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural Networks' abstract: 'Efficient deep neural network (DNN) models equipped with compact operators (e.g., depthwise convolutions) have shown great potential in reducing DNNs’ theoretical complexity (e.g., the total number of weights/operations) while maintaining a decent model accuracy. However, existing efficient DNNs are still limited in fulfilling their promise in boosting real-hardware efficiency, due to their commonly adopted compact operators’ low hardware utilization. In this work, we open up a new compression paradigm for developing real-hardware efficient DNNs, leading to boosted hardware efficiency while maintaining model accuracy. Interestingly, we observe that while some DNN layers’ activation functions help DNNs’ training optimization and achievable accuracy, they can be properly removed after training without compromising the model accuracy. Inspired by this observation, we propose a framework dubbed DepthShrinker, which develops hardware-friendly compact networks via shrinking the basic building blocks of existing efficient DNNs that feature irregular computation patterns into dense ones with much improved hardware utilization and thus real-hardware efficiency. Excitingly, our DepthShrinker framework delivers hardware-friendly compact networks that outperform both state-of-the-art efficient DNNs and compression techniques, e.g., a 3.06% higher accuracy and 1.53x throughput on Tesla V100 over SOTA channel-wise pruning method MetaPruning. Our codes are available at: https://github.com/facebookresearch/DepthShrinker.' volume: 162 URL: https://proceedings.mlr.press/v162/fu22c.html PDF: https://proceedings.mlr.press/v162/fu22c/fu22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-fu22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yonggan family: Fu - given: Haichuan family: Yang - given: Jiayi family: Yuan - given: Meng family: Li - given: Cheng family: Wan - given: Raghuraman family: Krishnamoorthi - given: Vikas family: Chandra - given: Yingyan family: Lin editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6849-6862 id: fu22c issued: date-parts: - 2022 - 6 - 28 firstpage: 6849 lastpage: 6862 published: 2022-06-28 00:00:00 +0000 - title: 'Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning' abstract: 'Many advances in cooperative multi-agent reinforcement learning (MARL) are based on two common design principles: value decomposition and parameter sharing. A typical MARL algorithm of this fashion decomposes a centralized Q-function into local Q-networks with parameters shared across agents. Such an algorithmic paradigm enables centralized training and decentralized execution (CTDE) and leads to efficient learning in practice. Despite all the advantages, we revisit these two principles and show that in certain scenarios, e.g., environments with a highly multi-modal reward landscape, value decomposition, and parameter sharing can be problematic and lead to undesired outcomes. In contrast, policy gradient (PG) methods with individual policies provably converge to an optimal solution in these cases, which partially supports some recent empirical observations that PG can be effective in many MARL testbeds. Inspired by our theoretical analysis, we present practical suggestions on implementing multi-agent PG algorithms for either high rewards or diverse emergent behaviors and empirically validate our findings on a variety of domains, ranging from the simplified matrix and grid-world games to complex benchmarks such as StarCraft Multi-Agent Challenge and Google Research Football. We hope our insights could benefit the community towards developing more general and more powerful MARL algorithms.' volume: 162 URL: https://proceedings.mlr.press/v162/fu22d.html PDF: https://proceedings.mlr.press/v162/fu22d/fu22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-fu22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Wei family: Fu - given: Chao family: Yu - given: Zelai family: Xu - given: Jiaqi family: Yang - given: Yi family: Wu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6863-6877 id: fu22d issued: date-parts: - 2022 - 6 - 28 firstpage: 6863 lastpage: 6877 published: 2022-06-28 00:00:00 +0000 - title: '$p$-Laplacian Based Graph Neural Networks' abstract: 'Graph neural networks (GNNs) have demonstrated superior performance for semi-supervised node classification on graphs, as a result of their ability to exploit node features and topological information simultaneously. However, most GNNs implicitly assume that the labels of nodes and their neighbors in a graph are the same or consistent, which does not hold in heterophilic graphs, where the labels of linked nodes are likely to differ. Moreover, when the topology is non-informative for label prediction, ordinary GNNs may work significantly worse than simply applying multi-layer perceptrons (MLPs) on each node. To tackle the above problem, we propose a new $p$-Laplacian based GNN model, termed as $^p$GNN, whose message passing mechanism is derived from a discrete regularization framework and could be theoretically explained as an approximation of a polynomial graph filter defined on the spectral domain of $p$-Laplacians. The spectral analysis shows that the new message passing mechanism works as low-high-pass filters, thus making $^p$GNNs are effective on both homophilic and heterophilic graphs. Empirical studies on real-world and synthetic datasets validate our findings and demonstrate that $^p$GNNs significantly outperform several state-of-the-art GNN architectures on heterophilic benchmarks while achieving competitive performance on homophilic benchmarks. Moreover, $^p$GNNs can adaptively learn aggregation weights and are robust to noisy edges.' volume: 162 URL: https://proceedings.mlr.press/v162/fu22e.html PDF: https://proceedings.mlr.press/v162/fu22e/fu22e.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-fu22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Guoji family: Fu - given: Peilin family: Zhao - given: Yatao family: Bian editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6878-6917 id: fu22e issued: date-parts: - 2022 - 6 - 28 firstpage: 6878 lastpage: 6917 published: 2022-06-28 00:00:00 +0000 - title: 'Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error' abstract: 'In this work, we study the use of the Bellman equation as a surrogate objective for value prediction accuracy. While the Bellman equation is uniquely solved by the true value function over all state-action pairs, we find that the Bellman error (the difference between both sides of the equation) is a poor proxy for the accuracy of the value function. In particular, we show that (1) due to cancellations from both sides of the Bellman equation, the magnitude of the Bellman error is only weakly related to the distance to the true value function, even when considering all state-action pairs, and (2) in the finite data regime, the Bellman equation can be satisfied exactly by infinitely many suboptimal solutions. This means that the Bellman error can be minimized without improving the accuracy of the value function. We demonstrate these phenomena through a series of propositions, illustrative toy examples, and empirical analysis in standard benchmark domains.' volume: 162 URL: https://proceedings.mlr.press/v162/fujimoto22a.html PDF: https://proceedings.mlr.press/v162/fujimoto22a/fujimoto22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-fujimoto22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Scott family: Fujimoto - given: David family: Meger - given: Doina family: Precup - given: Ofir family: Nachum - given: Shixiang Shane family: Gu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6918-6943 id: fujimoto22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6918 lastpage: 6943 published: 2022-06-28 00:00:00 +0000 - title: 'Robin Hood and Matthew Effects: Differential Privacy Has Disparate Impact on Synthetic Data' abstract: 'Generative models trained with Differential Privacy (DP) can be used to generate synthetic data while minimizing privacy risks. We analyze the impact of DP on these models vis-a-vis underrepresented classes/subgroups of data, specifically, studying: 1) the size of classes/subgroups in the synthetic data, and 2) the accuracy of classification tasks run on them. We also evaluate the effect of various levels of imbalance and privacy budgets. Our analysis uses three state-of-the-art DP models (PrivBayes, DP-WGAN, and PATE-GAN) and shows that DP yields opposite size distributions in the generated synthetic data. It affects the gap between the majority and minority classes/subgroups; in some cases by reducing it (a "Robin Hood" effect) and, in others, by increasing it (a "Matthew" effect). Either way, this leads to (similar) disparate impacts on the accuracy of classification tasks on the synthetic data, affecting disproportionately more the underrepresented subparts of the data. Consequently, when training models on synthetic data, one might incur the risk of treating different subpopulations unevenly, leading to unreliable or unfair conclusions.' volume: 162 URL: https://proceedings.mlr.press/v162/ganev22a.html PDF: https://proceedings.mlr.press/v162/ganev22a/ganev22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ganev22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Georgi family: Ganev - given: Bristena family: Oprisanu - given: Emiliano family: De Cristofaro editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6944-6959 id: ganev22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6944 lastpage: 6959 published: 2022-06-28 00:00:00 +0000 - title: 'The Complexity of k-Means Clustering when Little is Known' abstract: 'In the area of data analysis and arguably even in machine learning as a whole, few approaches have been as impactful as the classical k-means clustering. Here, we study the complexity of k-means clustering in settings where most of the data is not known or simply irrelevant. To obtain a more fine-grained understanding of the tractability of this clustering problem, we apply the parameterized complexity paradigm and obtain three new algorithms for k-means clustering of incomplete data: one for the clustering of bounded-domain (i.e., integer) data, and two incomparable algorithms that target real-valued data. Our approach is based on exploiting structural properties of a graphical encoding of the missing entries, and we show that tractability can be achieved using significantly less restrictive parameterizations than in the complementary case of few missing entries.' volume: 162 URL: https://proceedings.mlr.press/v162/ganian22a.html PDF: https://proceedings.mlr.press/v162/ganian22a/ganian22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ganian22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Robert family: Ganian - given: Thekla family: Hamm - given: Viktoriia family: Korchemna - given: Karolina family: Okrasa - given: Kirill family: Simonov editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6960-6987 id: ganian22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6960 lastpage: 6987 published: 2022-06-28 00:00:00 +0000 - title: 'IDYNO: Learning Nonparametric DAGs from Interventional Dynamic Data' abstract: 'Causal discovery in the form of a directed acyclic graph (DAG) for time series data has been widely studied in various domains. The resulting DAG typically represents a dynamic Bayesian network (DBN), capturing both the instantaneous and time-delayed relationships among variables of interest. We propose a new algorithm, IDYNO, to learn the DAG structure from potentially nonlinear times series data by using a continuous optimization framework that includes a recent formulation for continuous acyclicity constraint. The proposed algorithm is designed to handle both observational and interventional time series data. We demonstrate the promising performance of our method on synthetic benchmark datasets against state-of-the-art baselines. In addition, we show that the proposed method can more accurately learn the underlying structure of a sequential decision model, such as a Markov decision process, with a fixed policy in typical continuous control tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/gao22a.html PDF: https://proceedings.mlr.press/v162/gao22a/gao22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-gao22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tian family: Gao - given: Debarun family: Bhattacharjya - given: Elliot family: Nelson - given: Miao family: Liu - given: Yue family: Yu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 6988-7001 id: gao22a issued: date-parts: - 2022 - 6 - 28 firstpage: 6988 lastpage: 7001 published: 2022-06-28 00:00:00 +0000 - title: 'Loss Function Learning for Domain Generalization by Implicit Gradient' abstract: 'Generalising robustly to distribution shift is a major challenge that is pervasive across most real-world applications of machine learning. A recent study highlighted that many advanced algorithms proposed to tackle such domain generalisation (DG) fail to outperform a properly tuned empirical risk minimisation (ERM) baseline. We take a different approach, and explore the impact of the ERM loss function on out-of-domain generalisation. In particular, we introduce a novel meta-learning approach to loss function search based on implicit gradient. This enables us to discover a general purpose parametric loss function that provides a drop-in replacement for cross-entropy. Our loss can be used in standard training pipelines to efficiently train robust models using any neural architecture on new datasets. The results show that it clearly surpasses cross-entropy, enables simple ERM to outperform some more complicated prior DG methods, and provides state-of-the-art performance across a variety of DG benchmarks. Furthermore, unlike most existing DG approaches, our setup applies to the most practical setting of single-source domain generalisation, on which we show significant improvement.' volume: 162 URL: https://proceedings.mlr.press/v162/gao22b.html PDF: https://proceedings.mlr.press/v162/gao22b/gao22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-gao22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Boyan family: Gao - given: Henry family: Gouk - given: Yongxin family: Yang - given: Timothy family: Hospedales editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7002-7016 id: gao22b issued: date-parts: - 2022 - 6 - 28 firstpage: 7002 lastpage: 7016 published: 2022-06-28 00:00:00 +0000 - title: 'On the Convergence of Local Stochastic Compositional Gradient Descent with Momentum' abstract: 'Federated Learning has been actively studied due to its efficiency in numerous real-world applications in the past few years. However, the federated stochastic compositional optimization problem is still underexplored, even though it has widespread applications in machine learning. In this paper, we developed a novel local stochastic compositional gradient descent with momentum method, which facilitates Federated Learning for the stochastic compositional problem. Importantly, we investigated the convergence rate of our proposed method and proved that it can achieve the $O(1/\epsilon^4)$ sample complexity, which is better than existing methods. Meanwhile, our communication complexity $O(1/\epsilon^3)$ can match existing methods. To the best of our knowledge, this is the first work achieving such favorable sample and communication complexities. Additionally, extensive experimental results demonstrate the superior empirical performance over existing methods, confirming the efficacy of our method.' volume: 162 URL: https://proceedings.mlr.press/v162/gao22c.html PDF: https://proceedings.mlr.press/v162/gao22c/gao22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-gao22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hongchang family: Gao - given: Junyi family: Li - given: Heng family: Huang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7017-7035 id: gao22c issued: date-parts: - 2022 - 6 - 28 firstpage: 7017 lastpage: 7035 published: 2022-06-28 00:00:00 +0000 - title: 'Deep Reference Priors: What is the best way to pretrain a model?' abstract: 'What is the best way to exploit extra data – be it unlabeled data from the same task, or labeled data from a related task – to learn a given task? This paper formalizes the question using the theory of reference priors. Reference priors are objective, uninformative Bayesian priors that maximize the mutual information between the task and the weights of the model. Such priors enable the task to maximally affect the Bayesian posterior, e.g., reference priors depend upon the number of samples available for learning the task and for very small sample sizes, the prior puts more probability mass on low-complexity models in the hypothesis space. This paper presents the first demonstration of reference priors for medium-scale deep networks and image-based data. We develop generalizations of reference priors and demonstrate applications to two problems. First, by using unlabeled data to compute the reference prior, we develop new Bayesian semi-supervised learning methods that remain effective even with very few samples per class. Second, by using labeled data from the source task to compute the reference prior, we develop a new pretraining method for transfer learning that allows data from the target task to maximally affect the Bayesian posterior. Empirical validation of these methods is conducted on image classification datasets. Code is available at https://github.com/grasp-lyrl/deep_reference_priors' volume: 162 URL: https://proceedings.mlr.press/v162/gao22d.html PDF: https://proceedings.mlr.press/v162/gao22d/gao22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-gao22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yansong family: Gao - given: Rahul family: Ramesh - given: Pratik family: Chaudhari editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7036-7051 id: gao22d issued: date-parts: - 2022 - 6 - 28 firstpage: 7036 lastpage: 7051 published: 2022-06-28 00:00:00 +0000 - title: 'On the Equivalence Between Temporal and Static Equivariant Graph Representations' abstract: 'This work formalizes the associational task of predicting node attribute evolution in temporal graphs from the perspective of learning equivariant representations. We show that node representations in temporal graphs can be cast into two distinct frameworks: (a) The most popular approach, which we denote as time-and-graph, where equivariant graph (e.g., GNN) and sequence (e.g., RNN) representations are intertwined to represent the temporal evolution of node attributes in the graph; and (b) an approach that we denote as time-then-graph, where the sequences describing the node and edge dynamics are represented first, then fed as node and edge attributes into a static equivariant graph representation that comes after. Interestingly, we show that time-then-graph representations have an expressivity advantage over time-and-graph representations when both use component GNNs that are not most-expressive (e.g., 1-Weisfeiler-Lehman GNNs). Moreover, while our goal is not necessarily to obtain state-of-the-art results, our experiments show that time-then-graph methods are capable of achieving better performance and efficiency than state-of-the-art time-and-graph methods in some real-world tasks, thereby showcasing that the time-then-graph framework is a worthy addition to the graph ML toolbox.' volume: 162 URL: https://proceedings.mlr.press/v162/gao22e.html PDF: https://proceedings.mlr.press/v162/gao22e/gao22e.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-gao22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jianfei family: Gao - given: Bruno family: Ribeiro editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7052-7076 id: gao22e issued: date-parts: - 2022 - 6 - 28 firstpage: 7052 lastpage: 7076 published: 2022-06-28 00:00:00 +0000 - title: 'Generalizing Gaussian Smoothing for Random Search' abstract: 'Gaussian smoothing (GS) is a derivative-free optimization (DFO) algorithm that estimates the gradient of an objective using perturbations of the current parameters sampled from a standard normal distribution. We generalize it to sampling perturbations from a larger family of distributions. Based on an analysis of DFO for non-convex functions, we propose to choose a distribution for perturbations that minimizes the mean squared error (MSE) of the gradient estimate. We derive three such distributions with provably smaller MSE than Gaussian smoothing. We conduct evaluations of the three sampling distributions on linear regression, reinforcement learning, and DFO benchmarks in order to validate our claims. Our proposal improves on GS with the same computational complexity, and are competitive with and usually outperform Guided ES and Orthogonal ES, two computationally more expensive algorithms that adapt the covariance matrix of normally distributed perturbations.' volume: 162 URL: https://proceedings.mlr.press/v162/gao22f.html PDF: https://proceedings.mlr.press/v162/gao22f/gao22f.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-gao22f.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Katelyn family: Gao - given: Ozan family: Sener editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7077-7101 id: gao22f issued: date-parts: - 2022 - 6 - 28 firstpage: 7077 lastpage: 7101 published: 2022-06-28 00:00:00 +0000 - title: 'Rethinking Image-Scaling Attacks: The Interplay Between Vulnerabilities in Machine Learning Systems' abstract: 'As real-world images come in varying sizes, the machine learning model is part of a larger system that includes an upstream image scaling algorithm. In this paper, we investigate the interplay between vulnerabilities of the image scaling procedure and machine learning models in the decision-based black-box setting. We propose a novel sampling strategy to make a black-box attack exploit vulnerabilities in scaling algorithms, scaling defenses, and the final machine learning model in an end-to-end manner. Based on this scaling-aware attack, we reveal that most existing scaling defenses are ineffective under threat from downstream models. Moreover, we empirically observe that standard black-box attacks can significantly improve their performance by exploiting the vulnerable scaling procedure. We further demonstrate this problem on a commercial Image Analysis API with decision-based black-box attacks.' volume: 162 URL: https://proceedings.mlr.press/v162/gao22g.html PDF: https://proceedings.mlr.press/v162/gao22g/gao22g.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-gao22g.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yue family: Gao - given: Ilia family: Shumailov - given: Kassem family: Fawaz editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7102-7121 id: gao22g issued: date-parts: - 2022 - 6 - 28 firstpage: 7102 lastpage: 7121 published: 2022-06-28 00:00:00 +0000 - title: 'Lazy Estimation of Variable Importance for Large Neural Networks' abstract: 'As opaque predictive models increasingly impact many areas of modern life, interest in quantifying the importance of a given input variable for making a specific prediction has grown. Recently, there has been a proliferation of model-agnostic methods to measure variable importance (VI) that analyze the difference in predictive power between a full model trained on all variables and a reduced model that excludes the variable(s) of interest. A bottleneck common to these methods is the estimation of the reduced model for each variable (or subset of variables), which is an expensive process that often does not come with theoretical guarantees. In this work, we propose a fast and flexible method for approximating the reduced model with important inferential guarantees. We replace the need for fully retraining a wide neural network by a linearization initialized at the full model parameters. By adding a ridge-like penalty to make the problem convex, we prove that when the ridge penalty parameter is sufficiently large, our method estimates the variable importance measure with an error rate of O(1/n) where n is the number of training samples. We also show that our estimator is asymptotically normal, enabling us to provide confidence bounds for the VI estimates. We demonstrate through simulations that our method is fast and accurate under several data-generating regimes, and we demonstrate its real-world applicability on a seasonal climate forecasting example.' volume: 162 URL: https://proceedings.mlr.press/v162/gao22h.html PDF: https://proceedings.mlr.press/v162/gao22h/gao22h.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-gao22h.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yue family: Gao - given: Abby family: Stevens - given: Garvesh family: Raskutti - given: Rebecca family: Willett editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7122-7143 id: gao22h issued: date-parts: - 2022 - 6 - 28 firstpage: 7122 lastpage: 7143 published: 2022-06-28 00:00:00 +0000 - title: 'Fast and Reliable Evaluation of Adversarial Robustness with Minimum-Margin Attack' abstract: 'The AutoAttack (AA) has been the most reliable method to evaluate adversarial robustness when considerable computational resources are available. However, the high computational cost (e.g., 100 times more than that of the project gradient descent attack) makes AA infeasible for practitioners with limited computational resources, and also hinders applications of AA in the adversarial training (AT). In this paper, we propose a novel method, minimum-margin (MM) attack, to fast and reliably evaluate adversarial robustness. Compared with AA, our method achieves comparable performance but only costs 3% of the computational time in extensive experiments. The reliability of our method lies in that we evaluate the quality of adversarial examples using the margin between two targets that can precisely identify the most adversarial example. The computational efficiency of our method lies in an effective Sequential TArget Ranking Selection (STARS) method, ensuring that the cost of the MM attack is independent of the number of classes. The MM attack opens a new way for evaluating adversarial robustness and provides a feasible and reliable way to generate high-quality adversarial examples in AT.' volume: 162 URL: https://proceedings.mlr.press/v162/gao22i.html PDF: https://proceedings.mlr.press/v162/gao22i/gao22i.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-gao22i.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ruize family: Gao - given: Jiongxiao family: Wang - given: Kaiwen family: Zhou - given: Feng family: Liu - given: Binghui family: Xie - given: Gang family: Niu - given: Bo family: Han - given: James family: Cheng editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7144-7163 id: gao22i issued: date-parts: - 2022 - 6 - 28 firstpage: 7144 lastpage: 7163 published: 2022-06-28 00:00:00 +0000 - title: 'Value Function based Difference-of-Convex Algorithm for Bilevel Hyperparameter Selection Problems' abstract: 'Existing gradient-based optimization methods for hyperparameter tuning can only guarantee theoretical convergence to stationary solutions when the bilevel program satisfies the condition that for fixed upper-level variables, the lower-level is strongly convex (LLSC) and smooth (LLS). This condition is not satisfied for bilevel programs arising from tuning hyperparameters in many machine learning algorithms. In this work, we develop a sequentially convergent Value Function based Difference-of-Convex Algorithm with inexactness (VF-iDCA). We then ask: can this algorithm achieve stationary solutions without LLSC and LLS assumptions? We provide a positive answer to this question for bilevel programs from a broad class of hyperparameter tuning applications. Extensive experiments justify our theoretical results and demonstrate the superiority of the proposed VF-iDCA when applied to tune hyperparameters.' volume: 162 URL: https://proceedings.mlr.press/v162/gao22j.html PDF: https://proceedings.mlr.press/v162/gao22j/gao22j.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-gao22j.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Lucy L family: Gao - given: Jane family: Ye - given: Haian family: Yin - given: Shangzhi family: Zeng - given: Jin family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7164-7182 id: gao22j issued: date-parts: - 2022 - 6 - 28 firstpage: 7164 lastpage: 7182 published: 2022-06-28 00:00:00 +0000 - title: 'Learning to Incorporate Texture Saliency Adaptive Attention to Image Cartoonization' abstract: 'Image cartoonization is recently dominated by generative adversarial networks (GANs) from the perspective of unsupervised image-to-image translation, in which an inherent challenge is to precisely capture and sufficiently transfer characteristic cartoon styles (e.g., clear edges, smooth color shading, vivid colors, etc.). Existing advanced models try to enhance cartoonization effect by learning to promote edges adversarially, introducing style transfer loss, or learning to align style from multiple representation space. This paper demonstrates that more distinct and vivid cartoonization effect could be easily achieved with only basic adversarial loss. Observing that cartoon style is more evident in cartoon-texture-salient local image regions, we build a region-level adversarial learning branch in parallel with the normal image-level one, which constrains adversarial learning on cartoon-texture-salient local patches for better perceiving and transferring cartoon texture features. To this end, a novel cartoon-texture-saliency-sampler (CTSS) module is proposed to adaptively sample cartoon-texture-salient patches from training data. We present that such texture saliency adaptive attention is of significant importance in facilitating and enhancing cartoon stylization, which is a key missing ingredient of related methods. The superiority of our model in promoting cartoonization effect, especially for high-resolution input images, are fully demonstrated with extensive experiments.' volume: 162 URL: https://proceedings.mlr.press/v162/gao22k.html PDF: https://proceedings.mlr.press/v162/gao22k/gao22k.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-gao22k.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xiang family: Gao - given: Yuqi family: Zhang - given: Yingjie family: Tian editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7183-7207 id: gao22k issued: date-parts: - 2022 - 6 - 28 firstpage: 7183 lastpage: 7207 published: 2022-06-28 00:00:00 +0000 - title: 'Stochastic smoothing of the top-K calibrated hinge loss for deep imbalanced classification' abstract: 'In modern classification tasks, the number of labels is getting larger and larger, as is the size of the datasets encountered in practice. As the number of classes increases, class ambiguity and class imbalance become more and more problematic to achieve high top-1 accuracy. Meanwhile, Top-K metrics (metrics allowing K guesses) have become popular, especially for performance reporting. Yet, proposing top-K losses tailored for deep learning remains a challenge, both theoretically and practically. In this paper we introduce a stochastic top-K hinge loss inspired by recent developments on top-K calibrated losses. Our proposal is based on the smoothing of the top-K operator building on the flexible "perturbed optimizer" framework. We show that our loss function performs very well in the case of balanced datasets, while benefiting from a significantly lower computational time than the state-of-the-art top-K loss function. In addition, we propose a simple variant of our loss for the imbalanced case. Experiments on a heavy-tailed dataset show that our loss function significantly outperforms other baseline loss functions.' volume: 162 URL: https://proceedings.mlr.press/v162/garcin22a.html PDF: https://proceedings.mlr.press/v162/garcin22a/garcin22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-garcin22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Camille family: Garcin - given: Maximilien family: Servajean - given: Alexis family: Joly - given: Joseph family: Salmon editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7208-7222 id: garcin22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7208 lastpage: 7222 published: 2022-06-28 00:00:00 +0000 - title: 'PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method with Probabilistic Gradient Estimation' abstract: 'Despite their success, policy gradient methods suffer from high variance of the gradient estimator, which can result in unsatisfactory sample complexity. Recently, numerous variance-reduced extensions of policy gradient methods with provably better sample complexity and competitive numerical performance have been proposed. After a compact survey on some of the main variance-reduced REINFORCE-type methods, we propose ProbAbilistic Gradient Estimation for Policy Gradient (PAGE-PG), a novel loopless variance-reduced policy gradient method based on a probabilistic switch between two types of update. Our method is inspired by the PAGE estimator for supervised learning and leverages importance sampling to obtain an unbiased gradient estimator. We show that PAGE-PG enjoys a $\mathcal{O}\left( \epsilon^{-3} \right)$ average sample complexity to reach an $\epsilon$-stationary solution, which matches the sample complexity of its most competitive counterparts under the same setting. A numerical evaluation confirms the competitive performance of our method on classical control tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/gargiani22a.html PDF: https://proceedings.mlr.press/v162/gargiani22a/gargiani22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-gargiani22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Matilde family: Gargiani - given: Andrea family: Zanelli - given: Andrea family: Martinelli - given: Tyler family: Summers - given: John family: Lygeros editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7223-7240 id: gargiani22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7223 lastpage: 7240 published: 2022-06-28 00:00:00 +0000 - title: 'The power of first-order smooth optimization for black-box non-smooth problems' abstract: 'Gradient-free/zeroth-order methods for black-box convex optimization have been extensively studied in the last decade with the main focus on oracle calls complexity. In this paper, besides the oracle complexity, we focus also on iteration complexity, and propose a generic approach that, based on optimal first-order methods, allows to obtain in a black-box fashion new zeroth-order algorithms for non-smooth convex optimization problems. Our approach not only leads to optimal oracle complexity, but also allows to obtain iteration complexity similar to first-order methods, which, in turn, allows to exploit parallel computations to accelerate the convergence of our algorithms. We also elaborate on extensions for stochastic optimization problems, saddle-point problems, and distributed optimization.' volume: 162 URL: https://proceedings.mlr.press/v162/gasnikov22a.html PDF: https://proceedings.mlr.press/v162/gasnikov22a/gasnikov22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-gasnikov22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Alexander family: Gasnikov - given: Anton family: Novitskii - given: Vasilii family: Novitskii - given: Farshed family: Abdukhakimov - given: Dmitry family: Kamzolov - given: Aleksandr family: Beznosikov - given: Martin family: Takac - given: Pavel family: Dvurechensky - given: Bin family: Gu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7241-7265 id: gasnikov22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7241 lastpage: 7265 published: 2022-06-28 00:00:00 +0000 - title: 'A Functional Information Perspective on Model Interpretation' abstract: 'Contemporary predictive models are hard to interpret as their deep nets exploit numerous complex relations between input elements. This work suggests a theoretical framework for model interpretability by measuring the contribution of relevant features to the functional entropy of the network with respect to the input. We rely on the log-Sobolev inequality that bounds the functional entropy by the functional Fisher information with respect to the covariance of the data. This provides a principled way to measure the amount of information contribution of a subset of features to the decision function. Through extensive experiments, we show that our method surpasses existing interpretability sampling-based methods on various data signals such as image, text, and audio.' volume: 162 URL: https://proceedings.mlr.press/v162/gat22a.html PDF: https://proceedings.mlr.press/v162/gat22a/gat22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-gat22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Itai family: Gat - given: Nitay family: Calderon - given: Roi family: Reichart - given: Tamir family: Hazan editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7266-7278 id: gat22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7266 lastpage: 7278 published: 2022-06-28 00:00:00 +0000 - title: 'UniRank: Unimodal Bandit Algorithms for Online Ranking' abstract: 'We tackle, in the multiple-play bandit setting, the online ranking problem of assigning L items to K predefined positions on a web page in order to maximize the number of user clicks. We propose a generic algorithm, UniRank, that tackles state-of-the-art click models. The regret bound of this algorithm is a direct consequence of the pseudo-unimodality property of the bandit setting with respect to a graph where nodes are ordered sets of indistinguishable items. The main contribution of UniRank is its O(L/$\Delta$ logT) regret for T consecutive assignments, where $\Delta$ relates to the reward-gap between two items. This regret bound is based on the usually implicit condition that two items may not have the same attractiveness. Experiments against state-of-the-art learning algorithms specialized or not for different click models, show that our method has better regret performance than other generic algorithms on real life and synthetic datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/gauthier22a.html PDF: https://proceedings.mlr.press/v162/gauthier22a/gauthier22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-gauthier22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Camille-Sovanneary family: Gauthier - given: Romaric family: Gaudel - given: Elisa family: Fromont editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7279-7309 id: gauthier22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7279 lastpage: 7309 published: 2022-06-28 00:00:00 +0000 - title: 'Variational Inference with Locally Enhanced Bounds for Hierarchical Models' abstract: 'Hierarchical models represent a challenging setting for inference algorithms. MCMC methods struggle to scale to large models with many local variables and observations, and variational inference (VI) may fail to provide accurate approximations due to the use of simple variational families. Some variational methods (e.g. importance weighted VI) integrate Monte Carlo methods to give better accuracy, but these tend to be unsuitable for hierarchical models, as they do not allow for subsampling and their performance tends to degrade for high dimensional models. We propose a new family of variational bounds for hierarchical models, based on the application of tightening methods (e.g. importance weighting) separately for each group of local random variables. We show that our approach naturally allows the use of subsampling to get unbiased gradients, and that it fully leverages the power of methods that build tighter lower bounds by applying them independently in lower dimensional spaces, leading to better results and more accurate posterior approximations than relevant baselines.' volume: 162 URL: https://proceedings.mlr.press/v162/geffner22a.html PDF: https://proceedings.mlr.press/v162/geffner22a/geffner22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-geffner22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tomas family: Geffner - given: Justin family: Domke editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7310-7323 id: geffner22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7310 lastpage: 7323 published: 2022-06-28 00:00:00 +0000 - title: 'Inducing Causal Structure for Interpretable Neural Networks' abstract: 'In many areas, we have well-founded insights about causal structure that would be useful to bring into our trained models while still allowing them to learn in a data-driven fashion. To achieve this, we present the new method of interchange intervention training (IIT). In IIT, we (1) align variables in a causal model (e.g., a deterministic program or Bayesian network) with representations in a neural model and (2) train the neural model to match the counterfactual behavior of the causal model on a base input when aligned representations in both models are set to be the value they would be for a source input. IIT is fully differentiable, flexibly combines with other objectives, and guarantees that the target causal model is a causal abstraction of the neural model when its loss is zero. We evaluate IIT on a structural vision task (MNIST-PVR), a navigational language task (ReaSCAN), and a natural language inference task (MQNLI). We compare IIT against multi-task training objectives and data augmentation. In all our experiments, IIT achieves the best results and produces neural models that are more interpretable in the sense that they more successfully realize the target causal model.' volume: 162 URL: https://proceedings.mlr.press/v162/geiger22a.html PDF: https://proceedings.mlr.press/v162/geiger22a/geiger22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-geiger22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Atticus family: Geiger - given: Zhengxuan family: Wu - given: Hanson family: Lu - given: Josh family: Rozner - given: Elisa family: Kreiss - given: Thomas family: Icard - given: Noah family: Goodman - given: Christopher family: Potts editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7324-7338 id: geiger22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7324 lastpage: 7338 published: 2022-06-28 00:00:00 +0000 - title: 'Achieving Minimax Rates in Pool-Based Batch Active Learning' abstract: 'We consider a batch active learning scenario where the learner adaptively issues batches of points to a labeling oracle. Sampling labels in batches is highly desirable in practice due to the smaller number of interactive rounds with the labeling oracle (often human beings). However, batch active learning typically pays the price of a reduced adaptivity, leading to suboptimal results. In this paper we propose a solution which requires a careful trade off between the informativeness of the queried points and their diversity. We theoretically investigate batch active learning in the practically relevant scenario where the unlabeled pool of data is available beforehand (pool-based active learning). We analyze a novel stage-wise greedy algorithm and show that, as a function of the label complexity, the excess risk of this algorithm %operating in the realizable setting for which we prove matches the known minimax rates in standard statistical learning settings. Our results also exhibit a mild dependence on the batch size. These are the first theoretical results that employ careful trade offs between informativeness and diversity to rigorously quantify the statistical performance of batch active learning in the pool-based scenario.' volume: 162 URL: https://proceedings.mlr.press/v162/gentile22a.html PDF: https://proceedings.mlr.press/v162/gentile22a/gentile22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-gentile22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Claudio family: Gentile - given: Zhilei family: Wang - given: Tong family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7339-7367 id: gentile22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7339 lastpage: 7367 published: 2022-06-28 00:00:00 +0000 - title: 'Near-Exact Recovery for Tomographic Inverse Problems via Deep Learning' abstract: 'This work is concerned with the following fundamental question in scientific machine learning: Can deep-learning-based methods solve noise-free inverse problems to near-perfect accuracy? Positive evidence is provided for the first time, focusing on a prototypical computed tomography (CT) setup. We demonstrate that an iterative end-to-end network scheme enables reconstructions close to numerical precision, comparable to classical compressed sensing strategies. Our results build on our winning submission to the recent AAPM DL-Sparse-View CT Challenge. Its goal was to identify the state-of-the-art in solving the sparse-view CT inverse problem with data-driven techniques. A specific difficulty of the challenge setup was that the precise forward model remained unknown to the participants. Therefore, a key feature of our approach was to initially estimate the unknown fanbeam geometry in a data-driven calibration step. Apart from an in-depth analysis of our methodology, we also demonstrate its state-of-the-art performance on the open-access real-world dataset LoDoPaB CT.' volume: 162 URL: https://proceedings.mlr.press/v162/genzel22a.html PDF: https://proceedings.mlr.press/v162/genzel22a/genzel22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-genzel22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Martin family: Genzel - given: Ingo family: Gühring - given: Jan family: Macdonald - given: Maximilian family: März editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7368-7381 id: genzel22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7368 lastpage: 7381 published: 2022-06-28 00:00:00 +0000 - title: 'Online Learning for Min Sum Set Cover and Pandora’s Box' abstract: 'Two central problems in Stochastic Optimization are Min-Sum Set Cover and Pandora’s Box. In Pandora’s Box, we are presented with n boxes, each containing an unknown value and the goal is to open the boxes in some order to minimize the sum of the search cost and the smallest value found. Given a distribution of value vectors, we are asked to identify a near-optimal search order. Min-Sum Set Cover corresponds to the case where values are either 0 or infinity. In this work, we study the case where the value vectors are not drawn from a distribution but are presented to a learner in an online fashion. We present a computationally efficient algorithm that is constant-competitive against the cost of the optimal search order. We extend our results to a bandit setting where only the values of the boxes opened are revealed to the learner after every round. We also generalize our results to other commonly studied variants of Pandora’s Box and Min-Sum Set Cover that involve selecting more than a single value subject to a matroid constraint.' volume: 162 URL: https://proceedings.mlr.press/v162/gergatsouli22a.html PDF: https://proceedings.mlr.press/v162/gergatsouli22a/gergatsouli22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-gergatsouli22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Evangelia family: Gergatsouli - given: Christos family: Tzamos editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7382-7403 id: gergatsouli22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7382 lastpage: 7403 published: 2022-06-28 00:00:00 +0000 - title: 'Equivariance versus Augmentation for Spherical Images' abstract: 'We analyze the role of rotational equivariance in convolutional neural networks (CNNs) applied to spherical images. We compare the performance of the group equivariant networks known as S2CNNs and standard non-equivariant CNNs trained with an increasing amount of data augmentation. The chosen architectures can be considered baseline references for the respective design paradigms. Our models are trained and evaluated on single or multiple items from the MNIST- or FashionMNIST dataset projected onto the sphere. For the task of image classification, which is inherently rotationally invariant, we find that by considerably increasing the amount of data augmentation and the size of the networks, it is possible for the standard CNNs to reach at least the same performance as the equivariant network. In contrast, for the inherently equivariant task of semantic segmentation, the non-equivariant networks are consistently outperformed by the equivariant networks with significantly fewer parameters. We also analyze and compare the inference latency and training times of the different networks, enabling detailed tradeoff considerations between equivariant architectures and data augmentation for practical problems.' volume: 162 URL: https://proceedings.mlr.press/v162/gerken22a.html PDF: https://proceedings.mlr.press/v162/gerken22a/gerken22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-gerken22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jan family: Gerken - given: Oscar family: Carlsson - given: Hampus family: Linander - given: Fredrik family: Ohlsson - given: Christoffer family: Petersson - given: Daniel family: Persson editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7404-7421 id: gerken22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7404 lastpage: 7421 published: 2022-06-28 00:00:00 +0000 - title: 'A Regret Minimization Approach to Multi-Agent Control' abstract: 'We study the problem of multi-agent control of a dynamical system with known dynamics and adversarial disturbances. Our study focuses on optimal control without centralized precomputed policies, but rather with adaptive control policies for the different agents that are only equipped with a stabilizing controller. We give a reduction from any (standard) regret minimizing control method to a distributed algorithm. The reduction guarantees that the resulting distributed algorithm has low regret relative to the optimal precomputed joint policy. Our methodology involves generalizing online convex optimization to a multi-agent setting and applying recent tools from nonstochastic control derived for a single agent. We empirically evaluate our method on a model of an overactuated aircraft. We show that the distributed method is robust to failure and to adversarial perturbations in the dynamics.' volume: 162 URL: https://proceedings.mlr.press/v162/ghai22a.html PDF: https://proceedings.mlr.press/v162/ghai22a/ghai22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ghai22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Udaya family: Ghai - given: Udari family: Madhushani - given: Naomi family: Leonard - given: Elad family: Hazan editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7422-7434 id: ghai22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7422 lastpage: 7434 published: 2022-06-28 00:00:00 +0000 - title: 'Blocks Assemble! Learning to Assemble with Large-Scale Structured Reinforcement Learning' abstract: 'Assembly of multi-part physical structures is both a valuable end product for autonomous robotics, as well as a valuable diagnostic task for open-ended training of embodied intelligent agents. We introduce a naturalistic physics-based environment with a set of connectable magnet blocks inspired by children’s toy kits. The objective is to assemble blocks into a succession of target blueprints. Despite the simplicity of this objective, the compositional nature of building diverse blueprints from a set of blocks leads to an explosion of complexity in structures that agents encounter. Furthermore, assembly stresses agents’ multi-step planning, physical reasoning, and bimanual coordination. We find that the combination of large-scale reinforcement learning and graph-based policies – surprisingly without any additional complexity – is an effective recipe for training agents that not only generalize to complex unseen blueprints in a zero-shot manner, but even operate in a reset-free setting without being trained to do so. Through extensive experiments, we highlight the importance of large-scale training, structured representations, contributions of multi-task vs. single-task learning, as well as the effects of curriculums, and discuss qualitative behaviors of trained agents. Our accompanying project webpage can be found at: https://sites.google.com/view/learning-direct-assembly/home' volume: 162 URL: https://proceedings.mlr.press/v162/ghasemipour22a.html PDF: https://proceedings.mlr.press/v162/ghasemipour22a/ghasemipour22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ghasemipour22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Seyed Kamyar Seyed family: Ghasemipour - given: Satoshi family: Kataoka - given: Byron family: David - given: Daniel family: Freeman - given: Shixiang Shane family: Gu - given: Igor family: Mordatch editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7435-7469 id: ghasemipour22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7435 lastpage: 7469 published: 2022-06-28 00:00:00 +0000 - title: 'Faster Privacy Accounting via Evolving Discretization' abstract: 'We introduce a new algorithm for numerical composition of privacy random variables, useful for computing the accurate differential privacy parameters for compositions of mechanisms. Our algorithm achieves a running time and memory usage of $polylog(k)$ for the task of self-composing a mechanism, from a broad class of mechanisms, $k$ times; this class, e.g., includes the sub-sampled Gaussian mechanism, that appears in the analysis of differentially private stochastic gradient descent (DP-SGD). By comparison, recent work by Gopi et al. (NeurIPS 2021) has obtained a running time of $\widetilde{O}(\sqrt{k})$ for the same task. Our approach extends to the case of composing $k$ different mechanisms in the same class, improving upon the running time and memory usage in their work from $\widetilde{O}(k^{1.5})$ to $\wtilde{O}(k)$.' volume: 162 URL: https://proceedings.mlr.press/v162/ghazi22a.html PDF: https://proceedings.mlr.press/v162/ghazi22a/ghazi22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ghazi22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Badih family: Ghazi - given: Pritish family: Kamath - given: Ravi family: Kumar - given: Pasin family: Manurangsi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7470-7483 id: ghazi22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7470 lastpage: 7483 published: 2022-06-28 00:00:00 +0000 - title: 'Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations' abstract: 'Existing techniques for model inversion typically rely on hard-to-tune regularizers, such as total variation or feature regularization, which must be individually calibrated for each network in order to produce adequate images. In this work, we introduce Plug-In Inversion, which relies on a simple set of augmentations and does not require excessive hyper-parameter tuning. Under our proposed augmentation-based scheme, the same set of augmentation hyper-parameters can be used for inverting a wide range of image classification models, regardless of input dimensions or the architecture. We illustrate the practicality of our approach by inverting Vision Transformers (ViTs) and Multi-Layer Perceptrons (MLPs) trained on the ImageNet dataset, tasks which to the best of our knowledge have not been successfully accomplished by any previous works.' volume: 162 URL: https://proceedings.mlr.press/v162/ghiasi22a.html PDF: https://proceedings.mlr.press/v162/ghiasi22a/ghiasi22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ghiasi22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Amin family: Ghiasi - given: Hamid family: Kazemi - given: Steven family: Reich - given: Chen family: Zhu - given: Micah family: Goldblum - given: Tom family: Goldstein editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7484-7512 id: ghiasi22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7484 lastpage: 7512 published: 2022-06-28 00:00:00 +0000 - title: 'Offline RL Policies Should Be Trained to be Adaptive' abstract: 'Offline RL algorithms must account for the fact that the dataset they are provided may leave many facets of the environment unknown. The most common way to approach this challenge is to employ pessimistic or conservative methods, which avoid behaviors that are too dissimilar from those in the training dataset. However, relying exclusively on conservatism has drawbacks: performance is sensitive to the exact degree of conservatism, and conservative objectives can recover highly suboptimal policies. In this work, we propose that offline RL methods should instead be adaptive in the presence of uncertainty. We show that acting optimally in offline RL in a Bayesian sense involves solving an implicit POMDP. As a result, optimal policies for offline RL must be adaptive, depending not just on the current state but rather all the transitions seen so far during evaluation. We present a model-free algorithm for approximating this optimal adaptive policy, and demonstrate the efficacy of learning such adaptive policies in offline RL benchmarks.' volume: 162 URL: https://proceedings.mlr.press/v162/ghosh22a.html PDF: https://proceedings.mlr.press/v162/ghosh22a/ghosh22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ghosh22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Dibya family: Ghosh - given: Anurag family: Ajay - given: Pulkit family: Agrawal - given: Sergey family: Levine editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7513-7530 id: ghosh22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7513 lastpage: 7530 published: 2022-06-28 00:00:00 +0000 - title: 'Breaking the $\sqrtT$ Barrier: Instance-Independent Logarithmic Regret in Stochastic Contextual Linear Bandits' abstract: 'We prove an instance independent (poly) logarithmic regret for stochastic contextual bandits with linear payoff. Previously, in \cite{chu2011contextual}, a lower bound of $\mathcal{O}(\sqrt{T})$ is shown for the contextual linear bandit problem with arbitrary (adversarily chosen) contexts. In this paper, we show that stochastic contexts indeed help to reduce the regret from $\sqrt{T}$ to $\polylog(T)$. We propose Low Regret Stochastic Contextual Bandits (\texttt{LR-SCB}), which takes advantage of the stochastic contexts and performs parameter estimation (in $\ell_2$ norm) and regret minimization simultaneously. \texttt{LR-SCB} works in epochs, where the parameter estimation of the previous epoch is used to reduce the regret of the current epoch. The (poly) logarithmic regret of \texttt{LR-SCB} stems from two crucial facts: (a) the application of a norm adaptive algorithm to exploit the parameter estimation and (b) an analysis of the shifted linear contextual bandit algorithm, showing that shifting results in increasing regret. We have also shown experimentally that stochastic contexts indeed incurs a regret that scales with $\polylog(T)$.' volume: 162 URL: https://proceedings.mlr.press/v162/ghosh22b.html PDF: https://proceedings.mlr.press/v162/ghosh22b/ghosh22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ghosh22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Avishek family: Ghosh - given: Abishek family: Sankararaman editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7531-7549 id: ghosh22b issued: date-parts: - 2022 - 6 - 28 firstpage: 7531 lastpage: 7549 published: 2022-06-28 00:00:00 +0000 - title: 'SCHA-VAE: Hierarchical Context Aggregation for Few-Shot Generation' abstract: 'A few-shot generative model should be able to generate data from a novel distribution by only observing a limited set of examples. In few-shot learning the model is trained on data from many sets from distributions sharing some underlying properties such as sets of characters from different alphabets or objects from different categories. We extend current latent variable models for sets to a fully hierarchical approach with an attention-based point to set-level aggregation and call our method SCHA-VAE for Set-Context-Hierarchical-Aggregation Variational Autoencoder. We explore likelihood-based model comparison, iterative data sampling, and adaptation-free out-of-distribution generalization. Our results show that the hierarchical formulation better captures the intrinsic variability within the sets in the small data regime. This work generalizes deep latent variable approaches to few-shot learning, taking a step toward large-scale few-shot generation with a formulation that readily works with current state-of-the-art deep generative models.' volume: 162 URL: https://proceedings.mlr.press/v162/giannone22a.html PDF: https://proceedings.mlr.press/v162/giannone22a/giannone22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-giannone22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Giorgio family: Giannone - given: Ole family: Winther editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7550-7569 id: giannone22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7550 lastpage: 7569 published: 2022-06-28 00:00:00 +0000 - title: 'A Joint Exponential Mechanism For Differentially Private Top-$k$' abstract: 'We present a differentially private algorithm for releasing the sequence of $k$ elements with the highest counts from a data domain of $d$ elements. The algorithm is a "joint" instance of the exponential mechanism, and its output space consists of all $O(d^k)$ length-$k$ sequences. Our main contribution is a method to sample this exponential mechanism in time $O(dk\log(k) + d\log(d))$ and space $O(dk)$. Experiments show that this approach outperforms existing pure differential privacy methods and improves upon even approximate differential privacy methods for moderate $k$.' volume: 162 URL: https://proceedings.mlr.press/v162/gillenwater22a.html PDF: https://proceedings.mlr.press/v162/gillenwater22a/gillenwater22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-gillenwater22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jennifer family: Gillenwater - given: Matthew family: Joseph - given: Andres family: Munoz - given: Monica Ribero family: Diaz editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7570-7582 id: gillenwater22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7570 lastpage: 7582 published: 2022-06-28 00:00:00 +0000 - title: 'Neuro-Symbolic Hierarchical Rule Induction' abstract: 'We propose Neuro-Symbolic Hierarchical Rule Induction, an efficient interpretable neuro-symbolic model, to solve Inductive Logic Programming (ILP) problems. In this model, which is built from a pre-defined set of meta-rules organized in a hierarchical structure, first-order rules are invented by learning embeddings to match facts and body predicates of a meta-rule. To instantiate, we specifically design an expressive set of generic meta-rules, and demonstrate they generate a consequent fragment of Horn clauses. As a differentiable model, HRI can be trained both via supervised learning and reinforcement learning. To converge to interpretable rules, we inject a controlled noise to avoid local optima and employ an interpretability-regularization term. We empirically validate our model on various tasks (ILP, visual genome, reinforcement learning) against relevant state-of-the-art methods, including traditional ILP methods and neuro-symbolic models.' volume: 162 URL: https://proceedings.mlr.press/v162/glanois22a.html PDF: https://proceedings.mlr.press/v162/glanois22a/glanois22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-glanois22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Claire family: Glanois - given: Zhaohui family: Jiang - given: Xuening family: Feng - given: Paul family: Weng - given: Matthieu family: Zimmer - given: Dong family: Li - given: Wulong family: Liu - given: Jianye family: Hao editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7583-7615 id: glanois22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7583 lastpage: 7615 published: 2022-06-28 00:00:00 +0000 - title: 'It’s Raw! Audio Generation with State-Space Models' abstract: 'Developing architectures suitable for modeling raw audio is a challenging problem due to the high sampling rates of audio waveforms. Standard sequence modeling approaches like RNNs and CNNs have previously been tailored to fit the demands of audio, but the resultant architectures make undesirable computational tradeoffs and struggle to model waveforms effectively. We propose SaShiMi, a new multi-scale architecture for waveform modeling built around the recently introduced S4 model for long sequence modeling. We identify that S4 can be unstable during autoregressive generation, and provide a simple improvement to its parameterization by drawing connections to Hurwitz matrices. SaShiMi yields state-of-the-art performance for unconditional waveform generation in the autoregressive setting. Additionally, SaShiMi improves non-autoregressive generation performance when used as the backbone architecture for a diffusion model. Compared to prior architectures in the autoregressive generation setting, SaShiMi generates piano and speech waveforms which humans find more musical and coherent respectively, e.g. 2{\texttimes} better mean opinion scores than WaveNet on an unconditional speech generation task. On a music generation task, SaShiMi outperforms WaveNet on density estimation and speed at both training and inference even when using 3{\texttimes} fewer parameters' volume: 162 URL: https://proceedings.mlr.press/v162/goel22a.html PDF: https://proceedings.mlr.press/v162/goel22a/goel22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-goel22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Karan family: Goel - given: Albert family: Gu - given: Chris family: Donahue - given: Christopher family: Re editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7616-7633 id: goel22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7616 lastpage: 7633 published: 2022-06-28 00:00:00 +0000 - title: 'RankSim: Ranking Similarity Regularization for Deep Imbalanced Regression' abstract: 'Data imbalance, in which a plurality of the data samples come from a small proportion of labels, poses a challenge in training deep neural networks. Unlike classification, in regression the labels are continuous, potentially boundless, and form a natural ordering. These distinct features of regression call for new techniques that leverage the additional information encoded in label-space relationships. This paper presents the RankSim (ranking similarity) regularizer for deep imbalanced regression, which encodes an inductive bias that samples that are closer in label space should also be closer in feature space. In contrast to recent distribution smoothing based approaches, RankSim captures both nearby and distant relationships: for a given data sample, RankSim encourages the sorted list of its neighbors in label space to match the sorted list of its neighbors in feature space. RankSim is complementary to conventional imbalanced learning techniques, including re-weighting, two-stage training, and distribution smoothing, and lifts the state-of-the-art performance on three imbalanced regression benchmarks: IMDB-WIKI-DIR, AgeDB-DIR, and STS-B-DIR.' volume: 162 URL: https://proceedings.mlr.press/v162/gong22a.html PDF: https://proceedings.mlr.press/v162/gong22a/gong22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-gong22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yu family: Gong - given: Greg family: Mori - given: Fred family: Tung editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7634-7649 id: gong22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7634 lastpage: 7649 published: 2022-06-28 00:00:00 +0000 - title: 'How to Fill the Optimum Set? Population Gradient Descent with Harmless Diversity' abstract: 'Although traditional optimization methods focus on finding a single optimal solution, most objective functions in modern machine learning problems, especially those in deep learning, often have multiple or infinite number of optimal points. Therefore, it is useful to consider the problem of finding a set of diverse points in the optimum set of an objective function. In this work, we frame this problem as a bi-level optimization problem of maximizing a diversity score inside the optimum set of the main loss function, and solve it with a simple population gradient descent framework that iteratively updates the points to maximize the diversity score in a fashion that does not hurt the optimization of the main loss. We demonstrate that our method can efficiently generate diverse solutions on multiple applications, e.g. text-to-image generation, text-to-mesh generation, molecular conformation generation and ensemble neural network training.' volume: 162 URL: https://proceedings.mlr.press/v162/gong22b.html PDF: https://proceedings.mlr.press/v162/gong22b/gong22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-gong22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Chengyue family: Gong - given: Lemeng family: Wu - given: Qiang family: Liu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7650-7664 id: gong22b issued: date-parts: - 2022 - 6 - 28 firstpage: 7650 lastpage: 7664 published: 2022-06-28 00:00:00 +0000 - title: 'Partial Label Learning via Label Influence Function' abstract: 'To deal with ambiguities in partial label learning (PLL), state-of-the-art strategies implement disambiguations by identifying the ground-truth label directly from the candidate label set. However, these approaches usually take the label that incurs a minimal loss as the ground-truth label or use the weight to represent which label has a high likelihood to be the ground-truth label. Little work has been done to investigate from the perspective of how a candidate label changing a predictive model. In this paper, inspired by influence function, we develop a novel PLL framework called Partial Label Learning via Label Influence Function (PLL-IF). Moreover, we implement the framework with two specific representative models, an SVM model and a neural network model, which are called PLL-IF+SVM and PLL-IF+NN method respectively. Extensive experiments conducted on various datasets demonstrate the superiorities of the proposed methods in terms of prediction accuracy, which in turn validates the effectiveness of the proposed PLL-IF framework.' volume: 162 URL: https://proceedings.mlr.press/v162/gong22c.html PDF: https://proceedings.mlr.press/v162/gong22c/gong22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-gong22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xiuwen family: Gong - given: Dong family: Yuan - given: Wei family: Bao editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7665-7678 id: gong22c issued: date-parts: - 2022 - 6 - 28 firstpage: 7665 lastpage: 7678 published: 2022-06-28 00:00:00 +0000 - title: 'Secure Distributed Training at Scale' abstract: 'Many areas of deep learning benefit from using increasingly larger neural networks trained on public data, as is the case for pre-trained models for NLP and computer vision. Training such models requires a lot of computational resources (e.g., HPC clusters) that are not available to small research groups and independent researchers. One way to address it is for several smaller groups to pool their computational resources together and train a model that benefits all participants. Unfortunately, in this case, any participant can jeopardize the entire training run by sending incorrect updates, deliberately or by mistake. Training in presence of such peers requires specialized distributed training algorithms with Byzantine tolerance. These algorithms often sacrifice efficiency by introducing redundant communication or passing all updates through a trusted server, making it infeasible to apply them to large-scale deep learning, where models can have billions of parameters. In this work, we propose a novel protocol for secure (Byzantine-tolerant) decentralized training that emphasizes communication efficiency.' volume: 162 URL: https://proceedings.mlr.press/v162/gorbunov22a.html PDF: https://proceedings.mlr.press/v162/gorbunov22a/gorbunov22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-gorbunov22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Eduard family: Gorbunov - given: Alexander family: Borzunov - given: Michael family: Diskin - given: Max family: Ryabinin editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7679-7739 id: gorbunov22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7679 lastpage: 7739 published: 2022-06-28 00:00:00 +0000 - title: 'Retrieval-Augmented Reinforcement Learning' abstract: 'Most deep reinforcement learning (RL) algorithms distill experience into parametric behavior policies or value functions via gradient updates. While effective, this approach has several disadvantages: (1) it is computationally expensive, (2) it can take many updates to integrate experiences into the parametric model, (3) experiences that are not fully integrated do not appropriately influence the agent’s behavior, and (4) behavior is limited by the capacity of the model. In this paper we explore an alternative paradigm in which we train a network to map a dataset of past experiences to optimal behavior. Specifically, we augment an RL agent with a retrieval process (parameterized as a neural network) that has direct access to a dataset of experiences. This dataset can come from the agent’s past experiences, expert demonstrations, or any other relevant source. The retrieval process is trained to retrieve information from the dataset that may be useful in the current context, to help the agent achieve its goal faster and more efficiently. The proposed method facilitates learning agents that at test time can condition their behavior on the entire dataset and not only the current state, or current trajectory. We integrate our method into two different RL agents: an offline DQN agent and an online R2D2 agent. In offline multi-task problems, we show that the retrieval-augmented DQN agent avoids task interference and learns faster than the baseline DQN agent. On Atari, we show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores. We run extensive ablations to measure the contributions of the components of our proposed method.' volume: 162 URL: https://proceedings.mlr.press/v162/goyal22a.html PDF: https://proceedings.mlr.press/v162/goyal22a/goyal22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-goyal22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Anirudh family: Goyal - given: Abram family: Friesen - given: Andrea family: Banino - given: Theophane family: Weber - given: Nan Rosemary family: Ke - given: Adrià Puigdomènech family: Badia - given: Arthur family: Guez - given: Mehdi family: Mirza - given: Peter C family: Humphreys - given: Ksenia family: Konyushova - given: Michal family: Valko - given: Simon family: Osindero - given: Timothy family: Lillicrap - given: Nicolas family: Heess - given: Charles family: Blundell editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7740-7765 id: goyal22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7740 lastpage: 7765 published: 2022-06-28 00:00:00 +0000 - title: 'The State of Sparse Training in Deep Reinforcement Learning' abstract: 'The use of sparse neural networks has seen rapid growth in recent years, particularly in computer vision. Their appeal stems largely from the reduced number of parameters required to train and store, as well as in an increase in learning efficiency. Somewhat surprisingly, there have been very few efforts exploring their use in Deep Reinforcement Learning (DRL). In this work we perform a systematic investigation into applying a number of existing sparse training techniques on a variety of DRL agents and environments. Our results corroborate the findings from sparse training in the computer vision domain {–}sparse networks perform better than dense networks for the same parameter count{–} in the DRL domain. We provide detailed analyses on how the various components in DRL are affected by the use of sparse networks and conclude by suggesting promising avenues for improving the effectiveness of sparse training methods, as well as for advancing their use in DRL.' volume: 162 URL: https://proceedings.mlr.press/v162/graesser22a.html PDF: https://proceedings.mlr.press/v162/graesser22a/graesser22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-graesser22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Laura family: Graesser - given: Utku family: Evci - given: Erich family: Elsen - given: Pablo Samuel family: Castro editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7766-7792 id: graesser22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7766 lastpage: 7792 published: 2022-06-28 00:00:00 +0000 - title: 'Causal Inference Through the Structural Causal Marginal Problem' abstract: 'We introduce an approach to counterfactual inference based on merging information from multiple datasets. We consider a causal reformulation of the statistical marginal problem: given a collection of marginal structural causal models (SCMs) over distinct but overlapping sets of variables, determine the set of joint SCMs that are counterfactually consistent with the marginal ones. We formalise this approach for categorical SCMs using the response function formulation and show that it reduces the space of allowed marginal and joint SCMs. Our work thus highlights a new mode of falsifiability through additional variables, in contrast to the statistical one via additional data.' volume: 162 URL: https://proceedings.mlr.press/v162/gresele22a.html PDF: https://proceedings.mlr.press/v162/gresele22a/gresele22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-gresele22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Luigi family: Gresele - given: Julius Von family: Kügelgen - given: Jonas family: Kübler - given: Elke family: Kirschbaum - given: Bernhard family: Schölkopf - given: Dominik family: Janzing editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7793-7824 id: gresele22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7793 lastpage: 7824 published: 2022-06-28 00:00:00 +0000 - title: 'Mirror Learning: A Unifying Framework of Policy Optimisation' abstract: 'Modern deep reinforcement learning (RL) algorithms are motivated by either the general policy improvement (GPI) or trust-region learning (TRL) frameworks. However, algorithms that strictly respect these theoretical frameworks have proven unscalable. Surprisingly, the only known scalable algorithms violate the GPI/TRL assumptions, e.g. due to required regularisation or other heuristics. The current explanation of their empirical success is essentially “by analogy”: they are deemed approximate adaptations of theoretically sound methods. Unfortunately, studies have shown that in practice these algorithms differ greatly from their conceptual ancestors. In contrast, in this paper, we introduce a novel theoretical framework, named Mirror Learning, which provides theoretical guarantees to a large class of algorithms, including TRPO and PPO. While the latter two exploit the flexibility of our framework, GPI and TRL fit in merely as pathologically restrictive corner cases thereof. This suggests that the empirical performance of state-of-the-art methods is a direct consequence of their theoretical properties, rather than of aforementioned approximate analogies. Mirror learning sets us free to boldly explore novel, theoretically sound RL algorithms, a thus far uncharted wonderland.' volume: 162 URL: https://proceedings.mlr.press/v162/grudzien22a.html PDF: https://proceedings.mlr.press/v162/grudzien22a/grudzien22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-grudzien22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jakub family: Grudzien - given: Christian A Schroeder family: De Witt - given: Jakob family: Foerster editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7825-7844 id: grudzien22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7825 lastpage: 7844 published: 2022-06-28 00:00:00 +0000 - title: 'Adapting k-means Algorithms for Outliers' abstract: 'This paper shows how to adapt several simple and classical sampling-based algorithms for the k-means problem to the setting with outliers. Recently, Bhaskara et al. (NeurIPS 2019) showed how to adapt the classical k-means++ algorithm to the setting with outliers. However, their algorithm needs to output O(log(k)$\cdot$z) outliers, where z is the number of true outliers, to match the O(log k)-approximation guarantee of k-means++. In this paper, we build on their ideas and show how to adapt several sequential and distributed k-means algorithms to the setting with outliers, but with substantially stronger theoretical guarantees: our algorithms output (1 + $\epsilon$)z outliers while achieving an O(1/$\epsilon$)-approximation to the objective function. In the sequential world, we achieve this by adapting a recent algorithm of Lattanzi and Sohler (ICML 2019). In the distributed setting, we adapt a simple algorithm of Guha et al. (IEEE Trans. Know. and Data Engineering 2003) and the popular k-means\|{of} Bahmani et al. (PVLDB2012). A theoretical application of our techniques is an algorithm with running time O(nk^2/z) that achieves an O(1)-approximation to the objective function while outputting O(z) outliers, assuming k << z << n. This is complemented with a matching lower bound of $\Omega$(nk^2/z) for this problem in the oracle model.' volume: 162 URL: https://proceedings.mlr.press/v162/grunau22a.html PDF: https://proceedings.mlr.press/v162/grunau22a/grunau22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-grunau22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Christoph family: Grunau - given: Václav family: Rozhoň editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7845-7886 id: grunau22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7845 lastpage: 7886 published: 2022-06-28 00:00:00 +0000 - title: 'Variational Mixtures of ODEs for Inferring Cellular Gene Expression Dynamics' abstract: 'A key problem in computational biology is discovering the gene expression changes that regulate cell fate transitions, in which one cell type turns into another. However, each individual cell cannot be tracked longitudinally, and cells at the same point in real time may be at different stages of the transition process. This can be viewed as a problem of learning the behavior of a dynamical system from observations whose times are unknown. Additionally, a single progenitor cell type often bifurcates into multiple child cell types, further complicating the problem of modeling the dynamics. To address this problem, we developed an approach called variational mixtures of ordinary differential equations. By using a simple family of ODEs informed by the biochemistry of gene expression to constrain the likelihood of a deep generative model, we can simultaneously infer the latent time and latent state of each cell and predict its future gene expression state. The model can be interpreted as a mixture of ODEs whose parameters vary continuously across a latent space of cell states. Our approach dramatically improves data fit, latent time inference, and future cell state estimation of single-cell gene expression data compared to previous approaches.' volume: 162 URL: https://proceedings.mlr.press/v162/gu22a.html PDF: https://proceedings.mlr.press/v162/gu22a/gu22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-gu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yichen family: Gu - given: David T family: Blaauw - given: Joshua family: Welch editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7887-7901 id: gu22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7887 lastpage: 7901 published: 2022-06-28 00:00:00 +0000 - title: 'Learning Pseudometric-based Action Representations for Offline Reinforcement Learning' abstract: 'Offline reinforcement learning is a promising approach for practical applications since it does not require interactions with real-world environments. However, existing offline RL methods only work well in environments with continuous or small discrete action spaces. In environments with large and discrete action spaces, such as recommender systems and dialogue systems, the performance of existing methods decreases drastically because they suffer from inaccurate value estimation for a large proportion of out-of-distribution (o.o.d.) actions. While recent works have demonstrated that online RL benefits from incorporating semantic information in action representations, unfortunately, they fail to learn reasonable relative distances between action representations, which is key to offline RL to reduce the influence of o.o.d. actions. This paper proposes an action representation learning framework for offline RL based on a pseudometric, which measures both the behavioral relation and the data-distributional relation between actions. We provide theoretical analysis on the continuity of the expected Q-values and the offline policy improvement using the learned action representations. Experimental results show that our methods significantly improve the performance of two typical offline RL methods in environments with large and discrete action spaces.' volume: 162 URL: https://proceedings.mlr.press/v162/gu22b.html PDF: https://proceedings.mlr.press/v162/gu22b/gu22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-gu22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Pengjie family: Gu - given: Mengchen family: Zhao - given: Chen family: Chen - given: Dong family: Li - given: Jianye family: Hao - given: Bo family: An editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7902-7918 id: gu22b issued: date-parts: - 2022 - 6 - 28 firstpage: 7902 lastpage: 7918 published: 2022-06-28 00:00:00 +0000 - title: 'NeuroFluid: Fluid Dynamics Grounding with Particle-Driven Neural Radiance Fields' abstract: 'Deep learning has shown great potential for modeling the physical dynamics of complex particle systems such as fluids. Existing approaches, however, require the supervision of consecutive particle properties, including positions and velocities. In this paper, we consider a partially observable scenario known as fluid dynamics grounding, that is, inferring the state transitions and interactions within the fluid particle systems from sequential visual observations of the fluid surface. We propose a differentiable two-stage network named NeuroFluid. Our approach consists of (i) a particle-driven neural renderer, which involves fluid physical properties into the volume rendering function, and (ii) a particle transition model optimized to reduce the differences between the rendered and the observed images. NeuroFluid provides the first solution to unsupervised learning of particle-based fluid dynamics by training these two models jointly. It is shown to reasonably estimate the underlying physics of fluids with different initial shapes, viscosity, and densities.' volume: 162 URL: https://proceedings.mlr.press/v162/guan22a.html PDF: https://proceedings.mlr.press/v162/guan22a/guan22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-guan22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Shanyan family: Guan - given: Huayu family: Deng - given: Yunbo family: Wang - given: Xiaokang family: Yang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7919-7929 id: guan22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7919 lastpage: 7929 published: 2022-06-28 00:00:00 +0000 - title: 'Fast-Rate PAC-Bayesian Generalization Bounds for Meta-Learning' abstract: 'PAC-Bayesian error bounds provide a theoretical guarantee on the generalization abilities of meta-learning from training tasks to unseen tasks. However, it is still unclear how tight PAC-Bayesian bounds we can achieve for meta-learning. In this work, we propose a general PAC-Bayesian framework to cope with single-task learning and meta-learning uniformly. With this framework, we generalize the two tightest PAC-Bayesian bounds (i.e., kl-bound and Catoni-bound) from single-task learning to standard meta-learning, resulting in fast convergence rates for PAC-Bayesian meta-learners. By minimizing the derived two bounds, we develop two meta-learning algorithms for classification problems with deep neural networks. For regression problems, by setting Gibbs optimal posterior for each training task, we obtain the closed-form formula of the minimizer of our Catoni-bound, leading to an efficient Gibbs meta-learning algorithm. Although minimizing our kl-bound can not yield a closed-form solution, we show that it can be extended for analyzing the more challenging meta-learning setting where samples from different training tasks exhibit interdependencies. Experiments empirically show that our proposed meta-learning algorithms achieve competitive results with respect to latest works.' volume: 162 URL: https://proceedings.mlr.press/v162/guan22b.html PDF: https://proceedings.mlr.press/v162/guan22b/guan22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-guan22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jiechao family: Guan - given: Zhiwu family: Lu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7930-7948 id: guan22b issued: date-parts: - 2022 - 6 - 28 firstpage: 7930 lastpage: 7948 published: 2022-06-28 00:00:00 +0000 - title: 'Leveraging Approximate Symbolic Models for Reinforcement Learning via Skill Diversity' abstract: 'Creating reinforcement learning (RL) agents that are capable of accepting and leveraging task-specific knowledge from humans has been long identified as a possible strategy for developing scalable approaches for solving long-horizon problems. While previous works have looked at the possibility of using symbolic models along with RL approaches, they tend to assume that the high-level action models are executable at low level and the fluents can exclusively characterize all desirable MDP states. Symbolic models of real world tasks are however often incomplete. To this end, we introduce Approximate Symbolic-Model Guided Reinforcement Learning, wherein we will formalize the relationship between the symbolic model and the underlying MDP that will allow us to characterize the incompleteness of the symbolic model. We will use these models to extract high-level landmarks that will be used to decompose the task. At the low level, we learn a set of diverse policies for each possible task subgoal identified by the landmark, which are then stitched together. We evaluate our system by testing on three different benchmark domains and show how even with incomplete symbolic model information, our approach is able to discover the task structure and efficiently guide the RL agent towards the goal.' volume: 162 URL: https://proceedings.mlr.press/v162/guan22c.html PDF: https://proceedings.mlr.press/v162/guan22c/guan22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-guan22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Lin family: Guan - given: Sarath family: Sreedharan - given: Subbarao family: Kambhampati editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7949-7967 id: guan22c issued: date-parts: - 2022 - 6 - 28 firstpage: 7949 lastpage: 7967 published: 2022-06-28 00:00:00 +0000 - title: 'Large-Scale Graph Neural Architecture Search' abstract: 'Graph Neural Architecture Search (GNAS) has become a powerful method in automatically discovering suitable Graph Neural Network (GNN) architectures for different tasks. However, existing approaches fail to handle large-scale graphs because current performance estimation strategies in GNAS are computationally expensive for large-scale graphs and suffer from consistency collapse issues. To tackle these problems, we propose the Graph ArchitectUre Search at Scale (GAUSS) method that can handle large-scale graphs by designing an efficient light-weight supernet and the joint architecture-graph sampling. In particular, a graph sampling-based single-path one-shot supernet is proposed to reduce the computation burden. To address the consistency collapse issues, we further explicitly consider the joint architecture-graph sampling through a novel architecture peer learning mechanism on the sampled sub-graphs and an architecture importance sampling algorithm. Our proposed framework is able to smooth the highly non-convex optimization objective and stabilize the architecture sampling process. We provide theoretical analyses on GAUSS and empirically evaluate it on five datasets whose vertex sizes range from 10^4 to 10^8. The experimental results demonstrate substantial improvements of GAUSS over other GNAS baselines on all datasets. To the best of our knowledge, the proposed GAUSS method is the first graph neural architecture search framework that can handle graphs with billions of edges within 1 GPU day.' volume: 162 URL: https://proceedings.mlr.press/v162/guan22d.html PDF: https://proceedings.mlr.press/v162/guan22d/guan22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-guan22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Chaoyu family: Guan - given: Xin family: Wang - given: Hong family: Chen - given: Ziwei family: Zhang - given: Wenwu family: Zhu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7968-7981 id: guan22d issued: date-parts: - 2022 - 6 - 28 firstpage: 7968 lastpage: 7981 published: 2022-06-28 00:00:00 +0000 - title: 'Identifiability Conditions for Domain Adaptation' abstract: 'Domain adaptation algorithms and theory have relied upon an assumption that the observed data uniquely specify the correct correspondence between the domains. Unfortunately, it is unclear under what conditions this identifiability assumption holds, even when restricting ourselves to the case where a correct bijective map between domains exists. We study this bijective domain mapping problem and provide several new sufficient conditions for the identifiability of linear domain maps. As a consequence of our analysis, we show that weak constraints on the third moment tensor suffice for identifiability, prove identifiability for common latent variable models such as topic models, and give a computationally tractable method for generating certificates for the identifiability of linear maps. Inspired by our certification method, we derive a new objective function for domain mapping that explicitly accounts for uncertainty over maps arising from unidentifiability. We demonstrate that our objective leads to improvements in uncertainty quantification and model performance estimation.' volume: 162 URL: https://proceedings.mlr.press/v162/gulrajani22a.html PDF: https://proceedings.mlr.press/v162/gulrajani22a/gulrajani22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-gulrajani22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ishaan family: Gulrajani - given: Tatsunori family: Hashimoto editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7982-7997 id: gulrajani22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7982 lastpage: 7997 published: 2022-06-28 00:00:00 +0000 - title: 'A Parametric Class of Approximate Gradient Updates for Policy Optimization' abstract: 'Approaches to policy optimization have been motivated from diverse principles, based on how the parametric model is interpreted (e.g. value versus policy representation) or how the learning objective is formulated, yet they share a common goal of maximizing expected return. To better capture the commonalities and identify key differences between policy optimization methods, we develop a unified perspective that re-expresses the underlying updates in terms of a limited choice of gradient form and scaling function. In particular, we identify a parameterized space of approximate gradient updates for policy optimization that is highly structured, yet covers both classical and recent examples, including PPO. As a result, we obtain novel yet well motivated updates that generalize existing algorithms in a way that can deliver benefits both in terms of convergence speed and final result quality. An experimental investigation demonstrates that the additional degrees of freedom provided in the parameterized family of updates can be leveraged to obtain non-trivial improvements both in synthetic domains and on popular deep RL benchmarks.' volume: 162 URL: https://proceedings.mlr.press/v162/gummadi22a.html PDF: https://proceedings.mlr.press/v162/gummadi22a/gummadi22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-gummadi22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ramki family: Gummadi - given: Saurabh family: Kumar - given: Junfeng family: Wen - given: Dale family: Schuurmans editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 7998-8015 id: gummadi22a issued: date-parts: - 2022 - 6 - 28 firstpage: 7998 lastpage: 8015 published: 2022-06-28 00:00:00 +0000 - title: 'Provably Efficient Offline Reinforcement Learning for Partially Observable Markov Decision Processes' abstract: 'We study offline reinforcement learning (RL) for partially observable Markov decision processes (POMDPs) with possibly infinite state and observation spaces. Under the undercompleteness assumption, the optimal policy in such POMDPs are characterized by a class of finite-memory Bellman operators. In the offline setting, estimating these operators directly is challenging due to (i) the large observation space and (ii) insufficient coverage of the offline dataset. To tackle these challenges, we propose a novel algorithm that constructs confidence regions for these Bellman operators via offline estimation of their RKHS embeddings, and returns the final policy via pessimistic planning within the confidence regions. We prove that the proposed algorithm attains an \(\epsilon\)-optimal policy using an offline dataset containing \(\tilde\cO(1 / \epsilon^2)\){episodes}, provided that the behavior policy has good coverage over the optimal trajectory. To our best knowledge, our algorithm is the first provably sample efficient offline algorithm for POMDPs without uniform coverage assumptions.' volume: 162 URL: https://proceedings.mlr.press/v162/guo22a.html PDF: https://proceedings.mlr.press/v162/guo22a/guo22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-guo22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hongyi family: Guo - given: Qi family: Cai - given: Yufeng family: Zhang - given: Zhuoran family: Yang - given: Zhaoran family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8016-8038 id: guo22a issued: date-parts: - 2022 - 6 - 28 firstpage: 8016 lastpage: 8038 published: 2022-06-28 00:00:00 +0000 - title: 'No-Regret Learning in Partially-Informed Auctions' abstract: 'Auctions with partially-revealed information about items are broadly employed in real-world applications, but the underlying mechanisms have limited theoretical support. In this work, we study a machine learning formulation of these types of mechanisms, presenting algorithms that are no-regret from the buyer’s perspective. Specifically, a buyer who wishes to maximize his utility interacts repeatedly with a platform over a series of $T$ rounds. In each round, a new item is drawn from an unknown distribution and the platform publishes a price together with incomplete, “masked” information about the item. The buyer then decides whether to purchase the item. We formalize this problem as an online learning task where the goal is to have low regret with respect to a myopic oracle that has perfect knowledge of the distribution over items and the seller’s masking function. When the distribution over items is known to the buyer and the mask is a SimHash function mapping $\R^d$ to $\{0,1\}^{\ell}$, our algorithm has regret $\tilde \cO((Td\ell)^{\nicefrac{1}{2}})$. In a fully agnostic setting when the mask is an arbitrary function mapping to a set of size $n$ and the prices are stochastic, our algorithm has regret $\tilde \cO((Tn)^{\nicefrac{1}{2}})$.' volume: 162 URL: https://proceedings.mlr.press/v162/guo22b.html PDF: https://proceedings.mlr.press/v162/guo22b/guo22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-guo22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Wenshuo family: Guo - given: Michael family: Jordan - given: Ellen family: Vitercik editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8039-8055 id: guo22b issued: date-parts: - 2022 - 6 - 28 firstpage: 8039 lastpage: 8055 published: 2022-06-28 00:00:00 +0000 - title: 'Bounding Training Data Reconstruction in Private (Deep) Learning' abstract: 'Differential privacy is widely accepted as the de facto method for preventing data leakage in ML, and conventional wisdom suggests that it offers strong protection against privacy attacks. However, existing semantic guarantees for DP focus on membership inference, which may overestimate the adversary’s capabilities and is not applicable when membership status itself is non-sensitive. In this paper, we derive the first semantic guarantees for DP mechanisms against training data reconstruction attacks under a formal threat model. We show that two distinct privacy accounting methods—Renyi differential privacy and Fisher information leakage—both offer strong semantic protection against data reconstruction attacks.' volume: 162 URL: https://proceedings.mlr.press/v162/guo22c.html PDF: https://proceedings.mlr.press/v162/guo22c/guo22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-guo22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Chuan family: Guo - given: Brian family: Karrer - given: Kamalika family: Chaudhuri - given: Laurens prefix: van der family: Maaten editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8056-8071 id: guo22c issued: date-parts: - 2022 - 6 - 28 firstpage: 8056 lastpage: 8071 published: 2022-06-28 00:00:00 +0000 - title: 'Adversarially trained neural representations are already as robust as biological neural representations' abstract: 'Visual systems of primates are the gold standard of robust perception. There is thus a general belief that mimicking the neural representations that underlie those systems will yield artificial visual systems that are adversarially robust. In this work, we develop a method for performing adversarial visual attacks directly on primate brain activity. We then leverage this method to demonstrate that the above-mentioned belief might not be well-founded. Specifically, we report that the biological neurons that make up visual systems of primates exhibit susceptibility to adversarial perturbations that is comparable in magnitude to existing (robustly trained) artificial neural networks.' volume: 162 URL: https://proceedings.mlr.press/v162/guo22d.html PDF: https://proceedings.mlr.press/v162/guo22d/guo22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-guo22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Chong family: Guo - given: Michael family: Lee - given: Guillaume family: Leclerc - given: Joel family: Dapello - given: Yug family: Rao - given: Aleksander family: Madry - given: James family: Dicarlo editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8072-8081 id: guo22d issued: date-parts: - 2022 - 6 - 28 firstpage: 8072 lastpage: 8081 published: 2022-06-28 00:00:00 +0000 - title: 'Class-Imbalanced Semi-Supervised Learning with Adaptive Thresholding' abstract: 'Semi-supervised learning (SSL) has proven to be successful in overcoming labeling difficulties by leveraging unlabeled data. Previous SSL algorithms typically assume a balanced class distribution. However, real-world datasets are usually class-imbalanced, causing the performance of existing SSL algorithms to be seriously decreased. One essential reason is that pseudo-labels for unlabeled data are selected based on a fixed confidence threshold, resulting in low performance on minority classes. In this paper, we develop a simple yet effective framework, which only involves adaptive thresholding for different classes in SSL algorithms, and achieves remarkable performance improvement on more than twenty imbalance ratios. Specifically, we explicitly optimize the number of pseudo-labels for each class in the SSL objective, so as to simultaneously obtain adaptive thresholds and minimize empirical risk. Moreover, the determination of the adaptive threshold can be efficiently obtained by a closed-form solution. Extensive experimental results demonstrate the effectiveness of our proposed algorithms.' volume: 162 URL: https://proceedings.mlr.press/v162/guo22e.html PDF: https://proceedings.mlr.press/v162/guo22e/guo22e.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-guo22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Lan-Zhe family: Guo - given: Yu-Feng family: Li editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8082-8094 id: guo22e issued: date-parts: - 2022 - 6 - 28 firstpage: 8082 lastpage: 8094 published: 2022-06-28 00:00:00 +0000 - title: 'Deep Squared Euclidean Approximation to the Levenshtein Distance for DNA Storage' abstract: 'Storing information in DNA molecules is of great interest because of its advantages in longevity, high storage density, and low maintenance cost. A key step in the DNA storage pipeline is to efficiently cluster the retrieved DNA sequences according to their similarities. Levenshtein distance is the most suitable metric on the similarity between two DNA sequences, but it is inferior in terms of computational complexity and less compatible with mature clustering algorithms. In this work, we propose a novel deep squared Euclidean embedding for DNA sequences using Siamese neural network, squared Euclidean embedding, and chi-squared regression. The Levenshtein distance is approximated by the squared Euclidean distance between the embedding vectors, which is fast calculated and clustering algorithm friendly. The proposed approach is analyzed theoretically and experimentally. The results show that the proposed embedding is efficient and robust.' volume: 162 URL: https://proceedings.mlr.press/v162/guo22f.html PDF: https://proceedings.mlr.press/v162/guo22f/guo22f.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-guo22f.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Alan J.X. family: Guo - given: Cong family: Liang - given: Qing-Hu family: Hou editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8095-8108 id: guo22f issued: date-parts: - 2022 - 6 - 28 firstpage: 8095 lastpage: 8108 published: 2022-06-28 00:00:00 +0000 - title: 'Online Continual Learning through Mutual Information Maximization' abstract: 'This paper proposed a new online continual learning approach called OCM based on mutual information (MI) maximization. It achieves two objectives that are critical in dealing with catastrophic forgetting (CF). (1) It reduces feature bias caused by cross entropy (CE) as CE learns only discriminative features for each task, but these features may not be discriminative for another task. To learn a new task well, the network parameters learned before have to be modified, which causes CF. The new approach encourages the learning of each task to make use of the full features of the task training data. (2) It encourages preservation of the previously learned knowledge when training a new batch of incrementally arriving data. Empirical evaluation shows that OCM substantially outperforms the latest online CL baselines. For example, for CIFAR10, OCM improves the accuracy of the best baseline by 13.1% from 64.1% (baseline) to 77.2% (OCM).The code is publicly available at https://github.com/gydpku/OCM.' volume: 162 URL: https://proceedings.mlr.press/v162/guo22g.html PDF: https://proceedings.mlr.press/v162/guo22g/guo22g.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-guo22g.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yiduo family: Guo - given: Bing family: Liu - given: Dongyan family: Zhao editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8109-8126 id: guo22g issued: date-parts: - 2022 - 6 - 28 firstpage: 8109 lastpage: 8126 published: 2022-06-28 00:00:00 +0000 - title: 'Fast Provably Robust Decision Trees and Boosting' abstract: 'Learning with adversarial robustness has been a challenge in contemporary machine learning, and recent years have witnessed increasing attention on robust decision trees and ensembles, mostly working with high computational complexity or without guarantees of provable robustness. This work proposes the Fast Provably Robust Decision Tree (FPRDT) with the smallest computational complexity O(n log n), a tradeoff between global and local optimizations over the adversarial 0/1 loss. We further develop the Provably Robust AdaBoost (PRAdaBoost) according to our robust decision trees, and present convergence analysis for training adversarial 0/1 loss. We conduct extensive experiments to support our approaches; in particular, our approaches are superior to those unprovably robust methods, and achieve better or comparable performance to those provably robust methods yet with the smallest running time.' volume: 162 URL: https://proceedings.mlr.press/v162/guo22h.html PDF: https://proceedings.mlr.press/v162/guo22h/guo22h.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-guo22h.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jun-Qi family: Guo - given: Ming-Zhuo family: Teng - given: Wei family: Gao - given: Zhi-Hua family: Zhou editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8127-8144 id: guo22h issued: date-parts: - 2022 - 6 - 28 firstpage: 8127 lastpage: 8144 published: 2022-06-28 00:00:00 +0000 - title: 'Understanding and Improving Knowledge Graph Embedding for Entity Alignment' abstract: 'Embedding-based entity alignment (EEA) has recently received great attention. Despite significant performance improvement, few efforts have been paid to facilitate understanding of EEA methods. Most existing studies rest on the assumption that a small number of pre-aligned entities can serve as anchors connecting the embedding spaces of two KGs. Nevertheless, no one has investigated the rationality of such an assumption. To fill the research gap, we define a typical paradigm abstracted from existing EEA methods and analyze how the embedding discrepancy between two potentially aligned entities is implicitly bounded by a predefined margin in the score function. Further, we find that such a bound cannot guarantee to be tight enough for alignment learning. We mitigate this problem by proposing a new approach, named NeoEA, to explicitly learn KG-invariant and principled entity embeddings. In this sense, an EEA model not only pursues the closeness of aligned entities based on geometric distance, but also aligns the neural ontologies of two KGs by eliminating the discrepancy in embedding distribution and underlying ontology knowledge. Our experiments demonstrate consistent and significant performance improvement against the best-performing EEA methods.' volume: 162 URL: https://proceedings.mlr.press/v162/guo22i.html PDF: https://proceedings.mlr.press/v162/guo22i/guo22i.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-guo22i.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Lingbing family: Guo - given: Qiang family: Zhang - given: Zequn family: Sun - given: Mingyang family: Chen - given: Wei family: Hu - given: Huajun family: Chen editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8145-8156 id: guo22i issued: date-parts: - 2022 - 6 - 28 firstpage: 8145 lastpage: 8156 published: 2022-06-28 00:00:00 +0000 - title: 'NISPA: Neuro-Inspired Stability-Plasticity Adaptation for Continual Learning in Sparse Networks' abstract: 'The goal of continual learning (CL) is to learn different tasks over time. The main desiderata associated with CL are to maintain performance on older tasks, leverage the latter to improve learning of future tasks, and to introduce minimal overhead in the training process (for instance, to not require a growing model or retraining). We propose the Neuro-Inspired Stability-Plasticity Adaptation (NISPA) architecture that addresses these desiderata through a sparse neural network with fixed density. NISPA forms stable paths to preserve learned knowledge from older tasks. Also, NISPA uses connection rewiring to create new plastic paths that reuse existing knowledge on novel tasks. Our extensive evaluation on EMNIST, FashionMNIST, CIFAR10, and CIFAR100 datasets shows that NISPA significantly outperforms representative state-of-the-art continual learning baselines, and it uses up to ten times fewer learnable parameters compared to baselines. We also make the case that sparsity is an essential ingredient for continual learning. The NISPA code is available at https://github.com/BurakGurbuz97/NISPA.' volume: 162 URL: https://proceedings.mlr.press/v162/gurbuz22a.html PDF: https://proceedings.mlr.press/v162/gurbuz22a/gurbuz22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-gurbuz22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mustafa B family: Gurbuz - given: Constantine family: Dovrolis editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8157-8174 id: gurbuz22a issued: date-parts: - 2022 - 6 - 28 firstpage: 8157 lastpage: 8174 published: 2022-06-28 00:00:00 +0000 - title: 'Active Learning on a Budget: Opposite Strategies Suit High and Low Budgets' abstract: 'Investigating active learning, we focus on the relation between the number of labeled examples (budget size), and suitable querying strategies. Our theoretical analysis shows a behavior reminiscent of phase transition: typical examples are best queried when the budget is low, while unrepresentative examples are best queried when the budget is large. Combined evidence shows that a similar phenomenon occurs in common classification models. Accordingly, we propose TypiClust – a deep active learning strategy suited for low budgets. In a comparative empirical investigation of supervised learning, using a variety of architectures and image datasets, TypiClust outperforms all other active learning strategies in the low-budget regime. Using TypiClust in the semi-supervised framework, performance gets an even more significant boost. In particular, state-of-the-art semi-supervised methods trained on CIFAR-10 with 10 labeled examples selected by TypiClust, reach 93.2% accuracy – an improvement of 39.4% over random selection. Code is available at https://github.com/avihu111/TypiClust.' volume: 162 URL: https://proceedings.mlr.press/v162/hacohen22a.html PDF: https://proceedings.mlr.press/v162/hacohen22a/hacohen22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-hacohen22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Guy family: Hacohen - given: Avihu family: Dekel - given: Daphna family: Weinshall editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8175-8195 id: hacohen22a issued: date-parts: - 2022 - 6 - 28 firstpage: 8175 lastpage: 8195 published: 2022-06-28 00:00:00 +0000 - title: 'You Only Cut Once: Boosting Data Augmentation with a Single Cut' abstract: 'We present You Only Cut Once (YOCO) for performing data augmentations. YOCO cuts one image into two pieces and performs data augmentations individually within each piece. Applying YOCO improves the diversity of the augmentation per sample and encourages neural networks to recognize objects from partial information. YOCO enjoys the properties of parameter-free, easy usage, and boosting almost all augmentations for free. Thorough experiments are conducted to evaluate its effectiveness. We first demonstrate that YOCO can be seamlessly applied to varying data augmentations, neural network architectures, and brings performance gains on CIFAR and ImageNet classification tasks, sometimes surpassing conventional image-level augmentation by large margins. Moreover, we show YOCO benefits contrastive pre-training toward a more powerful representation that can be better transferred to multiple downstream tasks. Finally, we study a number of variants of YOCO and empirically analyze the performance for respective settings.' volume: 162 URL: https://proceedings.mlr.press/v162/han22a.html PDF: https://proceedings.mlr.press/v162/han22a/han22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-han22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Junlin family: Han - given: Pengfei family: Fang - given: Weihao family: Li - given: Jie family: Hong - given: Mohammad Ali family: Armin - given: Ian family: Reid - given: Lars family: Petersson - given: Hongdong family: Li editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8196-8212 id: han22a issued: date-parts: - 2022 - 6 - 28 firstpage: 8196 lastpage: 8212 published: 2022-06-28 00:00:00 +0000 - title: 'Scalable MCMC Sampling for Nonsymmetric Determinantal Point Processes' abstract: 'A determinantal point process (DPP) is an elegant model that assigns a probability to every subset of a collection of $n$ items. While conventionally a DPP is parameterized by a symmetric kernel matrix, removing this symmetry constraint, resulting in nonsymmetric DPPs (NDPPs), leads to significant improvements in modeling power and predictive performance. Recent work has studied an approximate Markov chain Monte Carlo (MCMC) sampling algorithm for NDPPs restricted to size-$k$ subsets (called $k$-NDPPs). However, the runtime of this approach is quadratic in $n$, making it infeasible for large-scale settings. In this work, we develop a scalable MCMC sampling algorithm for $k$-NDPPs with low-rank kernels, thus enabling runtime that is sublinear in $n$. Our method is based on a state-of-the-art NDPP rejection sampling algorithm, which we enhance with a novel approach for efficiently constructing the proposal distribution. Furthermore, we extend our scalable $k$-NDPP sampling algorithm to NDPPs without size constraints. Our resulting sampling method has polynomial time complexity in the rank of the kernel, while the existing approach has runtime that is exponential in the rank. With both a theoretical analysis and experiments on real-world datasets, we verify that our scalable approximate sampling algorithms are orders of magnitude faster than existing sampling approaches for $k$-NDPPs and NDPPs.' volume: 162 URL: https://proceedings.mlr.press/v162/han22b.html PDF: https://proceedings.mlr.press/v162/han22b/han22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-han22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Insu family: Han - given: Mike family: Gartrell - given: Elvis family: Dohmatob - given: Amin family: Karbasi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8213-8229 id: han22b issued: date-parts: - 2022 - 6 - 28 firstpage: 8213 lastpage: 8229 published: 2022-06-28 00:00:00 +0000 - title: 'G-Mixup: Graph Data Augmentation for Graph Classification' abstract: 'This work develops mixup for graph data. Mixup has shown superiority in improving the generalization and robustness of neural networks by interpolating features and labels between two random samples. Traditionally, Mixup can work on regular, grid-like, and Euclidean data such as image or tabular data. However, it is challenging to directly adopt Mixup to augment graph data because different graphs typically: 1) have different numbers of nodes; 2) are not readily aligned; and 3) have unique typologies in non-Euclidean space. To this end, we propose G-Mixup to augment graphs for graph classification by interpolating the generator (i.e., graphon) of different classes of graphs. Specifically, we first use graphs within the same class to estimate a graphon. Then, instead of directly manipulating graphs, we interpolate graphons of different classes in the Euclidean space to get mixed graphons, where the synthetic graphs are generated through sampling based on the mixed graphons. Extensive experiments show that G-Mixup substantially improves the generalization and robustness of GNNs.' volume: 162 URL: https://proceedings.mlr.press/v162/han22c.html PDF: https://proceedings.mlr.press/v162/han22c/han22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-han22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xiaotian family: Han - given: Zhimeng family: Jiang - given: Ninghao family: Liu - given: Xia family: Hu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8230-8248 id: han22c issued: date-parts: - 2022 - 6 - 28 firstpage: 8230 lastpage: 8248 published: 2022-06-28 00:00:00 +0000 - title: 'Private Streaming SCO in $\ell_p$ geometry with Applications in High Dimensional Online Decision Making' abstract: 'Differentially private (DP) stochastic convex optimization (SCO) is ubiquitous in trustworthy machine learning algorithm design. This paper studies the DP-SCO problem with streaming data sampled from a distribution and arrives sequentially. We also consider the continual release model where parameters related to private information are updated and released upon each new data. Numerous algorithms have been developed to achieve optimal excess risks in different $\ell_p$ norm geometries, but none of the existing ones can be adapted to the streaming and continual release setting. We propose a private variant of the Frank-Wolfe algorithm with recursive gradients for variance reduction to update and reveal the parameters upon each data. Combined with the adaptive DP analysis, our algorithm achieves the first optimal excess risk in linear time in the case $1contextual IDS over conditional IDS and emphasize the importance of considering the context distribution. The main message is that an intelligent agent should invest more on the actions that are beneficial for the future unseen contexts while the conditional IDS can be myopic. We further propose a computationally-efficient version of contextual IDS based on Actor-Critic and evaluate it empirically on a neural network contextual bandit.' volume: 162 URL: https://proceedings.mlr.press/v162/hao22b.html PDF: https://proceedings.mlr.press/v162/hao22b/hao22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-hao22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Botao family: Hao - given: Tor family: Lattimore - given: Chao family: Qin editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8446-8464 id: hao22b issued: date-parts: - 2022 - 6 - 28 firstpage: 8446 lastpage: 8464 published: 2022-06-28 00:00:00 +0000 - title: 'GSmooth: Certified Robustness against Semantic Transformations via Generalized Randomized Smoothing' abstract: 'Certified defenses such as randomized smoothing have shown promise towards building reliable machine learning systems against $\ell_p$ norm bounded attacks. However, existing methods are insufficient or unable to provably defend against semantic transformations, especially those without closed-form expressions (such as defocus blur and pixelate), which are more common in practice and often unrestricted. To fill up this gap, we propose generalized randomized smoothing (GSmooth), a unified theoretical framework for certifying robustness against general semantic transformations via a novel dimension augmentation strategy. Under the GSmooth framework, we present a scalable algorithm that uses a surrogate image-to-image network to approximate the complex transformation. The surrogate model provides a powerful tool for studying the properties of semantic transformations and certifying robustness. Experimental results on several datasets demonstrate the effectiveness of our approach for robustness certification against multiple kinds of semantic transformations and corruptions, which is not achievable by the alternative baselines.' volume: 162 URL: https://proceedings.mlr.press/v162/hao22c.html PDF: https://proceedings.mlr.press/v162/hao22c/hao22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-hao22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhongkai family: Hao - given: Chengyang family: Ying - given: Yinpeng family: Dong - given: Hang family: Su - given: Jian family: Song - given: Jun family: Zhu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8465-8483 id: hao22c issued: date-parts: - 2022 - 6 - 28 firstpage: 8465 lastpage: 8483 published: 2022-06-28 00:00:00 +0000 - title: 'Implicit Regularization with Polynomial Growth in Deep Tensor Factorization' abstract: 'We study the implicit regularization effects of deep learning in tensor factorization. While implicit regularization in deep matrix and ’shallow’ tensor factorization via linear and certain type of non-linear neural networks promotes low-rank solutions with at most quadratic growth, we show that its effect in deep tensor factorization grows polynomially with the depth of the network. This provides a remarkably faithful description of the observed experimental behaviour. Using numerical experiments, we demonstrate the benefits of this implicit regularization in yielding a more accurate estimation and better convergence properties.' volume: 162 URL: https://proceedings.mlr.press/v162/hariz22a.html PDF: https://proceedings.mlr.press/v162/hariz22a/hariz22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-hariz22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Kais family: Hariz - given: Hachem family: Kadri - given: Stephane family: Ayache - given: Maher family: Moakher - given: Thierry family: Artieres editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8484-8501 id: hariz22a issued: date-parts: - 2022 - 6 - 28 firstpage: 8484 lastpage: 8501 published: 2022-06-28 00:00:00 +0000 - title: 'Strategic Instrumental Variable Regression: Recovering Causal Relationships From Strategic Responses' abstract: 'In settings where Machine Learning (ML) algorithms automate or inform consequential decisions about people, individual decision subjects are often incentivized to strategically modify their observable attributes to receive more favorable predictions. As a result, the distribution the assessment rule is trained on may differ from the one it operates on in deployment. While such distribution shifts, in general, can hinder accurate predictions, our work identifies a unique opportunity associated with shifts due to strategic responses: We show that we can use strategic responses effectively to recover causal relationships between the observable features and outcomes we wish to predict, even under the presence of unobserved confounding variables. Specifically, our work establishes a novel connection between strategic responses to ML models and instrumental variable (IV) regression by observing that the sequence of deployed models can be viewed as an instrument that affects agents’ observable features but does not directly influence their outcomes. We show that our causal recovery method can be utilized to improve decision-making across several important criteria: individual fairness, agent outcomes, and predictive risk. In particular, we show that if decision subjects differ in their ability to modify non-causal attributes, any decision rule deviating from the causal coefficients can lead to (potentially unbounded) individual-level unfairness. .' volume: 162 URL: https://proceedings.mlr.press/v162/harris22a.html PDF: https://proceedings.mlr.press/v162/harris22a/harris22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-harris22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Keegan family: Harris - given: Dung Daniel T family: Ngo - given: Logan family: Stapleton - given: Hoda family: Heidari - given: Steven family: Wu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8502-8522 id: harris22a issued: date-parts: - 2022 - 6 - 28 firstpage: 8502 lastpage: 8522 published: 2022-06-28 00:00:00 +0000 - title: 'C*-algebra Net: A New Approach Generalizing Neural Network Parameters to C*-algebra' abstract: 'We propose a new framework that generalizes the parameters of neural network models to $C^*$-algebra-valued ones. $C^*$-algebra is a generalization of the space of complex numbers. A typical example is the space of continuous functions on a compact space. This generalization enables us to combine multiple models continuously and use tools for functions such as regression and integration. Consequently, we can learn features of data efficiently and adapt the models to problems continuously. We apply our framework to practical problems such as density estimation and few-shot learning and show that our framework enables us to learn features of data even with a limited number of samples. Our new framework highlights the potential possibility of applying the theory of $C^*$-algebra to general neural network models.' volume: 162 URL: https://proceedings.mlr.press/v162/hashimoto22a.html PDF: https://proceedings.mlr.press/v162/hashimoto22a/hashimoto22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-hashimoto22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yuka family: Hashimoto - given: Zhao family: Wang - given: Tomoko family: Matsui editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8523-8534 id: hashimoto22a issued: date-parts: - 2022 - 6 - 28 firstpage: 8523 lastpage: 8534 published: 2022-06-28 00:00:00 +0000 - title: 'General-purpose, long-context autoregressive modeling with Perceiver AR' abstract: 'Real-world data is high-dimensional: a book, image, or musical performance can easily contain hundreds of thousands of elements even after compression. However, the most commonly used autoregressive models, Transformers, are prohibitively expensive to scale to the number of inputs and layers needed to capture this long-range structure. We develop Perceiver AR, an autoregressive, modality-agnostic architecture which uses cross-attention to map long-range inputs to a small number of latents while also maintaining end-to-end causal masking. Perceiver AR can directly attend to over a hundred thousand tokens, enabling practical long-context density estimation without the need for hand-crafted sparsity patterns or memory mechanisms. When trained on images or music, Perceiver AR generates outputs with clear long-term coherence and structure. Our architecture also obtains state-of-the-art likelihood on long-sequence benchmarks, including 64x64 ImageNet images and PG-19 books.' volume: 162 URL: https://proceedings.mlr.press/v162/hawthorne22a.html PDF: https://proceedings.mlr.press/v162/hawthorne22a/hawthorne22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-hawthorne22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Curtis family: Hawthorne - given: Andrew family: Jaegle - given: Cătălina family: Cangea - given: Sebastian family: Borgeaud - given: Charlie family: Nash - given: Mateusz family: Malinowski - given: Sander family: Dieleman - given: Oriol family: Vinyals - given: Matthew family: Botvinick - given: Ian family: Simon - given: Hannah family: Sheahan - given: Neil family: Zeghidour - given: Jean-Baptiste family: Alayrac - given: Joao family: Carreira - given: Jesse family: Engel editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8535-8558 id: hawthorne22a issued: date-parts: - 2022 - 6 - 28 firstpage: 8535 lastpage: 8558 published: 2022-06-28 00:00:00 +0000 - title: 'On Distribution Shift in Learning-based Bug Detectors' abstract: 'Deep learning has recently achieved initial success in program analysis tasks such as bug detection. Lacking real bugs, most existing works construct training and test data by injecting synthetic bugs into correct programs. Despite achieving high test accuracy (e.g., >90%), the resulting bug detectors are found to be surprisingly unusable in practice, i.e., <10% precision when used to scan real software repositories. In this work, we argue that this massive performance difference is caused by a distribution shift, i.e., a fundamental mismatch between the real bug distribution and the synthetic bug distribution used to train and evaluate the detectors. To address this key challenge, we propose to train a bug detector in two phases, first on a synthetic bug distribution to adapt the model to the bug detection domain, and then on a real bug distribution to drive the model towards the real distribution. During these two phases, we leverage a multi-task hierarchy, focal loss, and contrastive learning to further boost performance. We evaluate our approach extensively on three widely studied bug types, for which we construct new datasets carefully designed to capture the real bug distribution. The results demonstrate that our approach is practically effective and successfully mitigates the distribution shift: our learned detectors are highly performant on both our test set and the latest version of open source repositories. Our code, datasets, and models are publicly available at https://github.com/eth-sri/learning-real-bug-detector.' volume: 162 URL: https://proceedings.mlr.press/v162/he22a.html PDF: https://proceedings.mlr.press/v162/he22a/he22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-he22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jingxuan family: He - given: Luca family: Beurer-Kellner - given: Martin family: Vechev editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8559-8580 id: he22a issued: date-parts: - 2022 - 6 - 28 firstpage: 8559 lastpage: 8580 published: 2022-06-28 00:00:00 +0000 - title: 'GNNRank: Learning Global Rankings from Pairwise Comparisons via Directed Graph Neural Networks' abstract: 'Recovering global rankings from pairwise comparisons has wide applications from time synchronization to sports team ranking. Pairwise comparisons corresponding to matches in a competition can be construed as edges in a directed graph (digraph), whose nodes represent e.g. competitors with an unknown rank. In this paper, we introduce neural networks into the ranking recovery problem by proposing the so-called GNNRank, a trainable GNN-based framework with digraph embedding. Moreover, new objectives are devised to encode ranking upsets/violations. The framework involves a ranking score estimation approach, and adds an inductive bias by unfolding the Fiedler vector computation of the graph constructed from a learnable similarity matrix. Experimental results on extensive data sets show that our methods attain competitive and often superior performance against baselines, as well as showing promising transfer ability. Codes and preprocessed data are at: \url{https://github.com/SherylHYX/GNNRank}.' volume: 162 URL: https://proceedings.mlr.press/v162/he22b.html PDF: https://proceedings.mlr.press/v162/he22b/he22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-he22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yixuan family: He - given: Quan family: Gan - given: David family: Wipf - given: Gesine D family: Reinert - given: Junchi family: Yan - given: Mihai family: Cucuringu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8581-8612 id: he22b issued: date-parts: - 2022 - 6 - 28 firstpage: 8581 lastpage: 8612 published: 2022-06-28 00:00:00 +0000 - title: 'Exploring the Gap between Collapsed & Whitened Features in Self-Supervised Learning' abstract: 'Avoiding feature collapse, when a Neural Network (NN) encoder maps all inputs to a constant vector, is a shared implicit desideratum of various methodological advances in self-supervised learning (SSL). To that end, whitened features have been proposed as an explicit objective to ensure uncollapsed features \cite{zbontar2021barlow,ermolov2021whitening,hua2021feature,bardes2022vicreg}. We identify power law behaviour in eigenvalue decay, parameterised by exponent $\beta{\geq}0$, as a spectrum that bridges between the collapsed & whitened feature extremes. We provide theoretical & empirical evidence highlighting the factors in SSL, like projection layers & regularisation strength, that influence eigenvalue decay rate, & demonstrate that the degree of feature whitening affects generalisation, particularly in label scarce regimes. We use our insights to motivate a novel method, PMP (PostMan-Pat), which efficiently post-processes a pretrained encoder to enforce eigenvalue decay rate with power law exponent $\beta$, & find that PostMan-Pat delivers improved label efficiency and transferability across a range of SSL methods and encoder architectures.' volume: 162 URL: https://proceedings.mlr.press/v162/he22c.html PDF: https://proceedings.mlr.press/v162/he22c/he22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-he22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Bobby family: He - given: Mete family: Ozay editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8613-8634 id: he22c issued: date-parts: - 2022 - 6 - 28 firstpage: 8613 lastpage: 8634 published: 2022-06-28 00:00:00 +0000 - title: 'Sparse Double Descent: Where Network Pruning Aggravates Overfitting' abstract: 'People usually believe that network pruning not only reduces the computational cost of deep networks, but also prevents overfitting by decreasing model capacity. However, our work surprisingly discovers that network pruning sometimes even aggravates overfitting. We report an unexpected sparse double descent phenomenon that, as we increase model sparsity via network pruning, test performance first gets worse (due to overfitting), then gets better (due to relieved overfitting), and gets worse at last (due to forgetting useful information). While recent studies focused on the deep double descent with respect to model overparameterization, they failed to recognize that sparsity may also cause double descent. In this paper, we have three main contributions. First, we report the novel sparse double descent phenomenon through extensive experiments. Second, for this phenomenon, we propose a novel learning distance interpretation that the curve of l2 learning distance of sparse models (from initialized parameters to final parameters) may correlate with the sparse double descent curve well and reflect generalization better than minima flatness. Third, in the context of sparse double descent, a winning ticket in the lottery ticket hypothesis surprisingly may not always win.' volume: 162 URL: https://proceedings.mlr.press/v162/he22d.html PDF: https://proceedings.mlr.press/v162/he22d/he22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-he22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zheng family: He - given: Zeke family: Xie - given: Quanzhi family: Zhu - given: Zengchang family: Qin editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8635-8659 id: he22d issued: date-parts: - 2022 - 6 - 28 firstpage: 8635 lastpage: 8659 published: 2022-06-28 00:00:00 +0000 - title: 'A Reduction from Linear Contextual Bandits Lower Bounds to Estimations Lower Bounds' abstract: 'Linear contextual bandits and their variants are usually solved using algorithms guided by parameter estimation. Cauchy-Schwartz inequality established that estimation errors dominate algorithm regrets, and thus, accurate estimators suffice to guarantee algorithms with low regrets. In this paper, we complete the reverse direction by establishing the necessity. In particular, we provide a generic transformation from algorithms for linear contextual bandits to estimators for linear models, and show that algorithm regrets dominate estimation errors of their induced estimators, i.e., low-regret algorithms must imply accurate estimators. Moreover, our analysis reduces the regret lower bound to an estimation error, bridging the lower bound analysis in linear contextual bandit problems and linear regression.' volume: 162 URL: https://proceedings.mlr.press/v162/he22e.html PDF: https://proceedings.mlr.press/v162/he22e/he22e.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-he22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jiahao family: He - given: Jiheng family: Zhang - given: Rachel Q. family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8660-8677 id: he22e issued: date-parts: - 2022 - 6 - 28 firstpage: 8660 lastpage: 8677 published: 2022-06-28 00:00:00 +0000 - title: 'HyperPrompt: Prompt-based Task-Conditioning of Transformers' abstract: 'Prompt-Tuning is a new paradigm for finetuning pre-trained language models in a parameter efficient way. Here, we explore the use of HyperNetworks to generate hyper-prompts: we propose HyperPrompt, a novel architecture for prompt-based task-conditioning of self-attention in Transformers. The hyper-prompts are end-to-end learnable via generation by a HyperNetwork. HyperPrompt allows the network to learn task-specific feature maps where the hyper-prompts serve as task global memories for the queries to attend to, at the same time enabling flexible information sharing among tasks. We show that HyperPrompt is competitive against strong multi-task learning baselines with as few as 0.14% of additional task-conditioning parameters, achieving great parameter and computational efficiency. Through extensive empirical experiments, we demonstrate that HyperPrompt can achieve superior performances over strong T5 multi-task learning baselines and parameter-efficient adapter variants including Prompt-Tuning and HyperFormer++ on Natural Language Understanding benchmarks of GLUE and SuperGLUE across many model sizes.' volume: 162 URL: https://proceedings.mlr.press/v162/he22f.html PDF: https://proceedings.mlr.press/v162/he22f/he22f.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-he22f.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yun family: He - given: Steven family: Zheng - given: Yi family: Tay - given: Jai family: Gupta - given: Yu family: Du - given: Vamsi family: Aribandi - given: Zhe family: Zhao - given: Yaguang family: Li - given: Zhao family: Chen - given: Donald family: Metzler - given: Heng-Tze family: Cheng - given: Ed H. family: Chi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8678-8690 id: he22f issued: date-parts: - 2022 - 6 - 28 firstpage: 8678 lastpage: 8690 published: 2022-06-28 00:00:00 +0000 - title: 'Label-Descriptive Patterns and Their Application to Characterizing Classification Errors' abstract: 'State-of-the-art deep learning methods achieve human-like performance on many tasks, but make errors nevertheless. Characterizing these errors in easily interpretable terms gives insight into whether a classifier is prone to making systematic errors, but also gives a way to act and improve the classifier. We propose to discover those feature-value combinations (i.e., patterns) that strongly correlate with correct resp. erroneous predictions to obtain a global and interpretable description for arbitrary classifiers. We show this is an instance of the more general label description problem, which we formulate in terms of the Minimum Description Length principle. To discover a good pattern set, we develop the efficient Premise algorithm. Through an extensive set of experiments we show it performs very well in practice on both synthetic and real-world data. Unlike existing solutions, it ably recovers ground truth patterns, even on highly imbalanced data over many features. Through two case studies on Visual Question Answering and Named Entity Recognition, we confirm that Premise gives clear and actionable insight into the systematic errors made by modern NLP classifiers.' volume: 162 URL: https://proceedings.mlr.press/v162/hedderich22a.html PDF: https://proceedings.mlr.press/v162/hedderich22a/hedderich22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-hedderich22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Michael A. family: Hedderich - given: Jonas family: Fischer - given: Dietrich family: Klakow - given: Jilles family: Vreeken editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8691-8707 id: hedderich22a issued: date-parts: - 2022 - 6 - 28 firstpage: 8691 lastpage: 8707 published: 2022-06-28 00:00:00 +0000 - title: 'NOMU: Neural Optimization-based Model Uncertainty' abstract: 'We study methods for estimating model uncertainty for neural networks (NNs) in regression. To isolate the effect of model uncertainty, we focus on a noiseless setting with scarce training data. We introduce five important desiderata regarding model uncertainty that any method should satisfy. However, we find that established benchmarks often fail to reliably capture some of these desiderata, even those that are required by Bayesian theory. To address this, we introduce a new approach for capturing model uncertainty for NNs, which we call Neural Optimization-based Model Uncertainty (NOMU). The main idea of NOMU is to design a network architecture consisting of two connected sub-NNs, one for model prediction and one for model uncertainty, and to train it using a carefully-designed loss function. Importantly, our design enforces that NOMU satisfies our five desiderata. Due to its modular architecture, NOMU can provide model uncertainty for any given (previously trained) NN if given access to its training data. We evaluate NOMU in various regressions tasks and noiseless Bayesian optimization (BO) with costly evaluations. In regression, NOMU performs at least as well as state-of-the-art methods. In BO, NOMU even outperforms all considered benchmarks.' volume: 162 URL: https://proceedings.mlr.press/v162/heiss22a.html PDF: https://proceedings.mlr.press/v162/heiss22a/heiss22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-heiss22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jakob M family: Heiss - given: Jakob family: Weissteiner - given: Hanna S family: Wutte - given: Sven family: Seuken - given: Josef family: Teichmann editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8708-8758 id: heiss22a issued: date-parts: - 2022 - 6 - 28 firstpage: 8708 lastpage: 8758 published: 2022-06-28 00:00:00 +0000 - title: 'Scaling Out-of-Distribution Detection for Real-World Settings' abstract: 'Detecting out-of-distribution examples is important for safety-critical machine learning applications such as detecting novel biological phenomena and self-driving cars. However, existing research mainly focuses on simple small-scale settings. To set the stage for more realistic out-of-distribution detection, we depart from small-scale settings and explore large-scale multiclass and multi-label settings with high-resolution images and thousands of classes. To make future work in real-world settings possible, we create new benchmarks for three large-scale settings. To test ImageNet multiclass anomaly detectors, we introduce the Species dataset containing over 700,000 images and over a thousand anomalous species. We leverage ImageNet-21K to evaluate PASCAL VOC and COCO multilabel anomaly detectors. Third, we introduce a new benchmark for anomaly segmentation by introducing a segmentation benchmark with road anomalies. We conduct extensive experiments in these more realistic settings for out-of-distribution detection and find that a surprisingly simple detector based on the maximum logit outperforms prior methods in all the large-scale multi-class, multi-label, and segmentation tasks, establishing a simple new baseline for future work.' volume: 162 URL: https://proceedings.mlr.press/v162/hendrycks22a.html PDF: https://proceedings.mlr.press/v162/hendrycks22a/hendrycks22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-hendrycks22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Dan family: Hendrycks - given: Steven family: Basart - given: Mantas family: Mazeika - given: Andy family: Zou - given: Joseph family: Kwon - given: Mohammadreza family: Mostajabi - given: Jacob family: Steinhardt - given: Dawn family: Song editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8759-8773 id: hendrycks22a issued: date-parts: - 2022 - 6 - 28 firstpage: 8759 lastpage: 8773 published: 2022-06-28 00:00:00 +0000 - title: 'Generalization Bounds using Lower Tail Exponents in Stochastic Optimizers' abstract: 'Despite the ubiquitous use of stochastic optimization algorithms in machine learning, the precise impact of these algorithms and their dynamics on generalization performance in realistic non-convex settings is still poorly understood. While recent work has revealed connections between generalization and heavy-tailed behavior in stochastic optimization, they mainly relied on continuous-time approximations; and a rigorous treatment for the original discrete-time iterations is yet to be performed. To bridge this gap, we present novel bounds linking generalization to the lower tail exponent of the transition kernel associated with the optimizer around a local minimum, in both discrete- and continuous-time settings. To achieve this, we first prove a data- and algorithm-dependent generalization bound in terms of the celebrated Fernique-Talagrand functional applied to the trajectory of the optimizer. Then, we specialize this result by exploiting the Markovian structure of stochastic optimizers, and derive bounds in terms of their (data-dependent) transition kernels. We support our theory with empirical results from a variety of neural networks, showing correlations between generalization error and lower tail exponents.' volume: 162 URL: https://proceedings.mlr.press/v162/hodgkinson22a.html PDF: https://proceedings.mlr.press/v162/hodgkinson22a/hodgkinson22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-hodgkinson22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Liam family: Hodgkinson - given: Umut family: Simsekli - given: Rajiv family: Khanna - given: Michael family: Mahoney editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8774-8795 id: hodgkinson22a issued: date-parts: - 2022 - 6 - 28 firstpage: 8774 lastpage: 8795 published: 2022-06-28 00:00:00 +0000 - title: 'Unsupervised Detection of Contextualized Embedding Bias with Application to Ideology' abstract: 'We propose a fully unsupervised method to detect bias in contextualized embeddings. The method leverages the assortative information latently encoded by social networks and combines orthogonality regularization, structured sparsity learning, and graph neural networks to find the embedding subspace capturing this information. As a concrete example, we focus on the phenomenon of ideological bias: we introduce the concept of an ideological subspace, show how it can be found by applying our method to online discussion forums, and present techniques to probe it. Our experiments suggest that the ideological subspace encodes abstract evaluative semantics and reflects changes in the political left-right spectrum during the presidency of Donald Trump.' volume: 162 URL: https://proceedings.mlr.press/v162/hofmann22a.html PDF: https://proceedings.mlr.press/v162/hofmann22a/hofmann22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-hofmann22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Valentin family: Hofmann - given: Janet family: Pierrehumbert - given: Hinrich family: Schütze editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8796-8810 id: hofmann22a issued: date-parts: - 2022 - 6 - 28 firstpage: 8796 lastpage: 8810 published: 2022-06-28 00:00:00 +0000 - title: 'Neural Laplace: Learning diverse classes of differential equations in the Laplace domain' abstract: 'Neural Ordinary Differential Equations model dynamical systems with ODEs learned by neural networks. However, ODEs are fundamentally inadequate to model systems with long-range dependencies or discontinuities, which are common in engineering and biological systems. Broader classes of differential equations (DE) have been proposed as remedies, including delay differential equations and integro-differential equations. Furthermore, Neural ODE suffers from numerical instability when modelling stiff ODEs and ODEs with piecewise forcing functions. In this work, we propose Neural Laplace, a unifying framework for learning diverse classes of DEs including all the aforementioned ones. Instead of modelling the dynamics in the time domain, we model it in the Laplace domain, where the history-dependencies and discontinuities in time can be represented as summations of complex exponentials. To make learning more efficient, we use the geometrical stereographic map of a Riemann sphere to induce more smoothness in the Laplace domain. In the experiments, Neural Laplace shows superior performance in modelling and extrapolating the trajectories of diverse classes of DEs, including the ones with complex history dependency and abrupt changes.' volume: 162 URL: https://proceedings.mlr.press/v162/holt22a.html PDF: https://proceedings.mlr.press/v162/holt22a/holt22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-holt22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Samuel I family: Holt - given: Zhaozhi family: Qian - given: Mihaela prefix: van der family: Schaar editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8811-8832 id: holt22a issued: date-parts: - 2022 - 6 - 28 firstpage: 8811 lastpage: 8832 published: 2022-06-28 00:00:00 +0000 - title: 'Deep Hierarchy in Bandits' abstract: 'Mean rewards of actions are often correlated. The form of these correlations may be complex and unknown a priori, such as the preferences of users for recommended products and their categories. To maximize statistical efficiency, it is important to leverage these correlations when learning. We formulate a bandit variant of this problem where the correlations of mean action rewards are represented by a hierarchical Bayesian model with latent variables. Since the hierarchy can have multiple layers, we call it deep. We propose a hierarchical Thompson sampling algorithm (HierTS) for this problem and show how to implement it efficiently for Gaussian hierarchies. The efficient implementation is possible due to a novel exact hierarchical representation of the posterior, which itself is of independent interest. We use this exact posterior to analyze the Bayes regret of HierTS. Our regret bounds reflect the structure of the problem, that the regret decreases with more informative priors, and can be recast to highlight reduced dependence on the number of actions. We confirm these theoretical findings empirically, in both synthetic and real-world experiments.' volume: 162 URL: https://proceedings.mlr.press/v162/hong22a.html PDF: https://proceedings.mlr.press/v162/hong22a/hong22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-hong22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Joey family: Hong - given: Branislav family: Kveton - given: Sumeet family: Katariya - given: Manzil family: Zaheer - given: Mohammad family: Ghavamzadeh editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8833-8851 id: hong22a issued: date-parts: - 2022 - 6 - 28 firstpage: 8833 lastpage: 8851 published: 2022-06-28 00:00:00 +0000 - title: 'DAdaQuant: Doubly-adaptive quantization for communication-efficient Federated Learning' abstract: 'Federated Learning (FL) is a powerful technique to train a model on a server with data from several clients in a privacy-preserving manner. FL incurs significant communication costs because it repeatedly transmits the model between the server and clients. Recently proposed algorithms quantize the model parameters to efficiently compress FL communication. We find that dynamic adaptations of the quantization level can boost compression without sacrificing model quality. We introduce DAdaQuant as a doubly-adaptive quantization algorithm that dynamically changes the quantization level across time and different clients. Our experiments show that DAdaQuant consistently improves client$\rightarrow$server compression, outperforming the strongest non-adaptive baselines by up to $2.8\times$.' volume: 162 URL: https://proceedings.mlr.press/v162/honig22a.html PDF: https://proceedings.mlr.press/v162/honig22a/honig22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-honig22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Robert family: Hönig - given: Yiren family: Zhao - given: Robert family: Mullins editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8852-8866 id: honig22a issued: date-parts: - 2022 - 6 - 28 firstpage: 8852 lastpage: 8866 published: 2022-06-28 00:00:00 +0000 - title: 'Equivariant Diffusion for Molecule Generation in 3D' abstract: 'This work introduces a diffusion model for molecule generation in 3D that is equivariant to Euclidean transformations. Our E(3) Equivariant Diffusion Model (EDM) learns to denoise a diffusion process with an equivariant network that jointly operates on both continuous (atom coordinates) and categorical features (atom types). In addition, we provide a probabilistic analysis which admits likelihood computation of molecules using our model. Experimentally, the proposed method significantly outperforms previous 3D molecular generative methods regarding the quality of generated samples and the efficiency at training time.' volume: 162 URL: https://proceedings.mlr.press/v162/hoogeboom22a.html PDF: https://proceedings.mlr.press/v162/hoogeboom22a/hoogeboom22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-hoogeboom22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Emiel family: Hoogeboom - given: Vı́ctor Garcia family: Satorras - given: Clément family: Vignac - given: Max family: Welling editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8867-8887 id: hoogeboom22a issued: date-parts: - 2022 - 6 - 28 firstpage: 8867 lastpage: 8887 published: 2022-06-28 00:00:00 +0000 - title: 'Conditional GANs with Auxiliary Discriminative Classifier' abstract: 'Conditional generative models aim to learn the underlying joint distribution of data and labels to achieve conditional data generation. Among them, the auxiliary classifier generative adversarial network (AC-GAN) has been widely used, but suffers from the problem of low intra-class diversity of the generated samples. The fundamental reason pointed out in this paper is that the classifier of AC-GAN is generator-agnostic, which therefore cannot provide informative guidance for the generator to approach the joint distribution, resulting in a minimization of the conditional entropy that decreases the intra-class diversity. Motivated by this understanding, we propose a novel conditional GAN with an auxiliary discriminative classifier (ADC-GAN) to resolve the above problem. Specifically, the proposed auxiliary discriminative classifier becomes generator-aware by recognizing the class-labels of the real data and the generated data discriminatively. Our theoretical analysis reveals that the generator can faithfully learn the joint distribution even without the original discriminator, making the proposed ADC-GAN robust to the value of the coefficient hyperparameter and the selection of the GAN loss, and stable during training. Extensive experimental results on synthetic and real-world datasets demonstrate the superiority of ADC-GAN in conditional generative modeling compared to state-of-the-art classifier-based and projection-based conditional GANs.' volume: 162 URL: https://proceedings.mlr.press/v162/hou22a.html PDF: https://proceedings.mlr.press/v162/hou22a/hou22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-hou22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Liang family: Hou - given: Qi family: Cao - given: Huawei family: Shen - given: Siyuan family: Pan - given: Xiaoshuang family: Li - given: Xueqi family: Cheng editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8888-8902 id: hou22a issued: date-parts: - 2022 - 6 - 28 firstpage: 8888 lastpage: 8902 published: 2022-06-28 00:00:00 +0000 - title: 'AdAUC: End-to-end Adversarial AUC Optimization Against Long-tail Problems' abstract: 'It is well-known that deep learning models are vulnerable to adversarial examples. Existing studies of adversarial training have made great progress against this challenge. As a typical trait, they often assume that the class distribution is overall balanced. However, long-tail datasets are ubiquitous in a wide spectrum of applications, where the amount of head class instances is significantly larger than the tail classes. Under such a scenario, AUC is a much more reasonable metric than accuracy since it is insensitive toward class distribution. Motivated by this, we present an early trial to explore adversarial training methods to optimize AUC. The main challenge lies in that the positive and negative examples are tightly coupled in the objective function. As a direct result, one cannot generate adversarial examples without a full scan of the dataset. To address this issue, based on a concavity regularization scheme, we reformulate the AUC optimization problem as a saddle point problem, where the objective becomes an instance-wise function. This leads to an end-to-end training protocol. Furthermore, we provide a convergence guarantee of the proposed training algorithm. Our analysis differs from the existing studies since the algorithm is asked to generate adversarial examples by calculating the gradient of a min-max problem. Finally, the extensive experimental results show the performance and robustness of our algorithm in three long-tail datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/hou22b.html PDF: https://proceedings.mlr.press/v162/hou22b/hou22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-hou22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Wenzheng family: Hou - given: Qianqian family: Xu - given: Zhiyong family: Yang - given: Shilong family: Bao - given: Yuan family: He - given: Qingming family: Huang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8903-8925 id: hou22b issued: date-parts: - 2022 - 6 - 28 firstpage: 8903 lastpage: 8925 published: 2022-06-28 00:00:00 +0000 - title: 'Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling' abstract: 'We introduce repriorisation, a data-dependent reparameterisation which transforms a Bayesian neural network (BNN) posterior to a distribution whose KL divergence to the BNN prior vanishes as layer widths grow. The repriorisation map acts directly on parameters, and its analytic simplicity complements the known neural network Gaussian process (NNGP) behaviour of wide BNNs in function space. Exploiting the repriorisation, we develop a Markov chain Monte Carlo (MCMC) posterior sampling algorithm which mixes faster the wider the BNN. This contrasts with the typically poor performance of MCMC in high dimensions. We observe up to 50x higher effective sample size relative to no reparametrisation for both fully-connected and residual networks. Improvements are achieved at all widths, with the margin between reparametrised and standard BNNs growing with layer width.' volume: 162 URL: https://proceedings.mlr.press/v162/hron22a.html PDF: https://proceedings.mlr.press/v162/hron22a/hron22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-hron22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jiri family: Hron - given: Roman family: Novak - given: Jeffrey family: Pennington - given: Jascha family: Sohl-Dickstein editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8926-8945 id: hron22a issued: date-parts: - 2022 - 6 - 28 firstpage: 8926 lastpage: 8945 published: 2022-06-28 00:00:00 +0000 - title: 'Learning inverse folding from millions of predicted structures' abstract: 'We consider the problem of predicting a protein sequence from its backbone atom coordinates. Machine learning approaches to this problem to date have been limited by the number of available experimentally determined protein structures. We augment training data by nearly three orders of magnitude by predicting structures for 12M protein sequences using AlphaFold2. Trained with this additional data, a sequence-to-sequence transformer with invariant geometric input processing layers achieves 51% native sequence recovery on structurally held-out backbones with 72% recovery for buried residues, an overall improvement of almost 10 percentage points over existing methods. The model generalizes to a variety of more complex tasks including design of protein complexes, partially masked structures, binding interfaces, and multiple states.' volume: 162 URL: https://proceedings.mlr.press/v162/hsu22a.html PDF: https://proceedings.mlr.press/v162/hsu22a/hsu22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-hsu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Chloe family: Hsu - given: Robert family: Verkuil - given: Jason family: Liu - given: Zeming family: Lin - given: Brian family: Hie - given: Tom family: Sercu - given: Adam family: Lerer - given: Alexander family: Rives editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8946-8970 id: hsu22a issued: date-parts: - 2022 - 6 - 28 firstpage: 8946 lastpage: 8970 published: 2022-06-28 00:00:00 +0000 - title: 'Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation' abstract: 'We study reinforcement learning with linear function approximation where the transition probability and reward functions are linear with respect to a feature mapping $\boldsymbol{\phi}(s,a)$. Specifically, we consider the episodic inhomogeneous linear Markov Decision Process (MDP), and propose a novel computation-efficient algorithm, LSVI-UCB$^+$, which achieves an $\widetilde{O}(Hd\sqrt{T})$ regret bound where $H$ is the episode length, $d$ is the feature dimension, and $T$ is the number of steps. LSVI-UCB$^+$ builds on weighted ridge regression and upper confidence value iteration with a Bernstein-type exploration bonus. Our statistical results are obtained with novel analytical tools, including a new Bernstein self-normalized bound with conservatism on elliptical potentials, and refined analysis of the correction term. To the best of our knowledge, this is the first minimax optimal algorithm for linear MDPs up to logarithmic factors, which closes the $\sqrt{Hd}$ gap between the best known upper bound of $\widetilde{O}(\sqrt{H^3d^3T})$ in \cite{jin2020provably} and lower bound of $\Omega(Hd\sqrt{T})$ for linear MDPs.' volume: 162 URL: https://proceedings.mlr.press/v162/hu22a.html PDF: https://proceedings.mlr.press/v162/hu22a/hu22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-hu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Pihe family: Hu - given: Yu family: Chen - given: Longbo family: Huang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 8971-9019 id: hu22a issued: date-parts: - 2022 - 6 - 28 firstpage: 8971 lastpage: 9019 published: 2022-06-28 00:00:00 +0000 - title: 'Neuron Dependency Graphs: A Causal Abstraction of Neural Networks' abstract: 'We discover that neural networks exhibit approximate logical dependencies among neurons, and we introduce Neuron Dependency Graphs (NDG) that extract and present them as directed graphs. In an NDG, each node corresponds to the boolean activation value of a neuron, and each edge models an approximate logical implication from one node to another. We show that the logical dependencies extracted from the training dataset generalize well to the test set. In addition to providing symbolic explanations to the neural network’s internal structure, NDGs can represent a Structural Causal Model. We empirically show that an NDG is a causal abstraction of the corresponding neural network that "unfolds" the same way under causal interventions using the theory by Geiger et al. (2021). Code is available at https://github.com/phimachine/ndg.' volume: 162 URL: https://proceedings.mlr.press/v162/hu22b.html PDF: https://proceedings.mlr.press/v162/hu22b/hu22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-hu22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yaojie family: Hu - given: Jin family: Tian editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9020-9040 id: hu22b issued: date-parts: - 2022 - 6 - 28 firstpage: 9020 lastpage: 9040 published: 2022-06-28 00:00:00 +0000 - title: 'Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent RL' abstract: 'Cooperative multi-agent reinforcement learning (MARL) is making rapid progress for solving tasks in a grid world and real-world scenarios, in which agents are given different attributes and goals, resulting in different behavior through the whole multi-agent task. In this study, we quantify the agent’s behavior difference and build its relationship with the policy performance via {\bf Role Diversity}, a metric to measure the characteristics of MARL tasks. We define role diversity from three perspectives: action-based, trajectory-based, and contribution-based to fully measure a multi-agent task. Through theoretical analysis, we find that the error bound in MARL can be decomposed into three parts that have a strong relation to the role diversity. The decomposed factors can significantly impact policy optimization in three popular directions including parameter sharing, communication mechanism, and credit assignment. The main experimental platforms are based on {\bf Multiagent Particle Environment (MPE) }and {\bf The StarCraft Multi-Agent Challenge (SMAC)}. Extensive experiments clearly show that role diversity can serve as a robust measurement for the characteristics of a multi-agent cooperation task and help diagnose whether the policy fits the current multi-agent system for better policy performance.' volume: 162 URL: https://proceedings.mlr.press/v162/hu22c.html PDF: https://proceedings.mlr.press/v162/hu22c/hu22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-hu22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Siyi family: Hu - given: Chuanlong family: Xie - given: Xiaodan family: Liang - given: Xiaojun family: Chang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9041-9071 id: hu22c issued: date-parts: - 2022 - 6 - 28 firstpage: 9041 lastpage: 9071 published: 2022-06-28 00:00:00 +0000 - title: 'On the Role of Discount Factor in Offline Reinforcement Learning' abstract: 'Offline reinforcement learning (RL) enables effective learning from previously collected data without exploration, which shows great promise in real-world applications when exploration is expensive or even infeasible. The discount factor, $\gamma$, plays a vital role in improving online RL sample efficiency and estimation accuracy, but the role of the discount factor in offline RL is not well explored. This paper examines two distinct effects of $\gamma$ in offline RL with theoretical analysis, namely the regularization effect and the pessimism effect. On the one hand, $\gamma$ is a regulator to trade-off optimality with sample efficiency upon existing offline techniques. On the other hand, lower guidance $\gamma$ can also be seen as a way of pessimism where we optimize the policy’s performance in the worst possible models. We empirically verify the above theoretical observation with tabular MDPs and standard D4RL tasks. The results show that the discount factor plays an essential role in the performance of offline RL algorithms, both under small data regimes upon existing offline methods and in large data regimes without other conservative methods.' volume: 162 URL: https://proceedings.mlr.press/v162/hu22d.html PDF: https://proceedings.mlr.press/v162/hu22d/hu22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-hu22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hao family: Hu - given: Yiqin family: Yang - given: Qianchuan family: Zhao - given: Chongjie family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9072-9098 id: hu22d issued: date-parts: - 2022 - 6 - 28 firstpage: 9072 lastpage: 9098 published: 2022-06-28 00:00:00 +0000 - title: 'Transformer Quality in Linear Time' abstract: 'We revisit the design choices in Transformers, and propose methods to address their weaknesses in handling long sequences. First, we propose a simple layer named gated attention unit, which allows the use of a weaker single-head attention with minimal quality loss. We then propose a linear approximation method complementary to this new layer, which is accelerator-friendly and highly competitive in quality. The resulting model, named FLASH, matches the perplexity of improved Transformers over both short (512) and long (8K) context lengths, achieving training speedups of up to 4.9x on Wiki-40B and 12.1x on PG-19 for auto-regressive language modeling, and 4.8x on C4 for masked language modeling.' volume: 162 URL: https://proceedings.mlr.press/v162/hua22a.html PDF: https://proceedings.mlr.press/v162/hua22a/hua22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-hua22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Weizhe family: Hua - given: Zihang family: Dai - given: Hanxiao family: Liu - given: Quoc family: Le editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9099-9117 id: hua22a issued: date-parts: - 2022 - 6 - 28 firstpage: 9099 lastpage: 9117 published: 2022-06-28 00:00:00 +0000 - title: 'Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents' abstract: 'Can world knowledge learned by large language models (LLMs) be used to act in interactive environments? In this paper, we investigate the possibility of grounding high-level tasks, expressed in natural language (e.g. “make breakfast”), to a chosen set of actionable steps (e.g. “open fridge”). While prior work focused on learning from explicit step-by-step examples of how to act, we surprisingly find that if pre-trained LMs are large enough and prompted appropriately, they can effectively decompose high-level tasks into mid-level plans without any further training. However, the plans produced naively by LLMs often cannot map precisely to admissible actions. We propose a procedure that conditions on existing demonstrations and semantically translates the plans to admissible actions. Our evaluation in the recent VirtualHome environment shows that the resulting method substantially improves executability over the LLM baseline. The conducted human evaluation reveals a trade-off between executability and correctness but shows a promising sign towards extracting actionable knowledge from language models.' volume: 162 URL: https://proceedings.mlr.press/v162/huang22a.html PDF: https://proceedings.mlr.press/v162/huang22a/huang22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-huang22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Wenlong family: Huang - given: Pieter family: Abbeel - given: Deepak family: Pathak - given: Igor family: Mordatch editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9118-9147 id: huang22a issued: date-parts: - 2022 - 6 - 28 firstpage: 9118 lastpage: 9147 published: 2022-06-28 00:00:00 +0000 - title: 'Forward Operator Estimation in Generative Models with Kernel Transfer Operators' abstract: 'Generative models which use explicit density modeling (e.g., variational autoencoders, flow-based generative models) involve finding a mapping from a known distribution, e.g. Gaussian, to the unknown input distribution. This often requires searching over a class of non-linear functions (e.g., representable by a deep neural network). While effective in practice, the associated runtime/memory costs can increase rapidly, usually as a function of the performance desired in an application. We propose a substantially cheaper (and simpler) forward operator estimation strategy based on adapting known results on kernel transfer operators. We show that our formulation enables highly efficient distribution approximation and sampling, and offers surprisingly good empirical performance that compares favorably with powerful baselines, but with significant runtime savings. We show that the algorithm also performs well in small sample size settings (in brain imaging).' volume: 162 URL: https://proceedings.mlr.press/v162/huang22b.html PDF: https://proceedings.mlr.press/v162/huang22b/huang22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-huang22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhichun family: Huang - given: Rudrasis family: Chakraborty - given: Vikas family: Singh editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9148-9172 id: huang22b issued: date-parts: - 2022 - 6 - 28 firstpage: 9148 lastpage: 9172 published: 2022-06-28 00:00:00 +0000 - title: 'Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits' abstract: 'In this paper, we generalize the concept of heavy-tailed multi-armed bandits to adversarial environments, and develop robust best-of-both-worlds algorithms for heavy-tailed multi-armed bandits (MAB), where losses have $\alpha$-th ($1<\alpha\le 2$) moments bounded by $\sigma^\alpha$, while the variances may not exist. Specifically, we design an algorithm \texttt{HTINF}, when the heavy-tail parameters $\alpha$ and $\sigma$ are known to the agent, \texttt{HTINF} simultaneously achieves the optimal regret for both stochastic and adversarial environments, without knowing the actual environment type a-priori. When $\alpha,\sigma$ are unknown, \texttt{HTINF} achieves a $\log T$-style instance-dependent regret in stochastic cases and $o(T)$ no-regret guarantee in adversarial cases. We further develop an algorithm \texttt{AdaTINF}, achieving $\mathcal O(\sigma K^{1-\nicefrac 1\alpha}T^{\nicefrac{1}{\alpha}})$ minimax optimal regret even in adversarial settings, without prior knowledge on $\alpha$ and $\sigma$. This result matches the known regret lower-bound (Bubeck et al., 2013), which assumed a stochastic environment and $\alpha$ and $\sigma$ are both known. To our knowledge, the proposed \texttt{HTINF} algorithm is the first to enjoy a best-of-both-worlds regret guarantee, and \texttt{AdaTINF} is the first algorithm that can adapt to both $\alpha$ and $\sigma$ to achieve optimal gap-indepedent regret bound in classical heavy-tailed stochastic MAB setting and our novel adversarial formulation.' volume: 162 URL: https://proceedings.mlr.press/v162/huang22c.html PDF: https://proceedings.mlr.press/v162/huang22c/huang22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-huang22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jiatai family: Huang - given: Yan family: Dai - given: Longbo family: Huang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9173-9200 id: huang22c issued: date-parts: - 2022 - 6 - 28 firstpage: 9173 lastpage: 9200 published: 2022-06-28 00:00:00 +0000 - title: 'Frustratingly Easy Transferability Estimation' abstract: 'Transferability estimation has been an essential tool in selecting a pre-trained model and the layers in it for transfer learning, to transfer, so as to maximize the performance on a target task and prevent negative transfer. Existing estimation algorithms either require intensive training on target tasks or have difficulties in evaluating the transferability between layers. To this end, we propose a simple, efficient, and effective transferability measure named TransRate. Through a single pass over examples of a target task, TransRate measures the transferability as the mutual information between features of target examples extracted by a pre-trained model and their labels. We overcome the challenge of efficient mutual information estimation by resorting to coding rate that serves as an effective alternative to entropy. From the perspective of feature representation, the resulting TransRate evaluates both completeness (whether features contain sufficient information of a target task) and compactness (whether features of each class are compact enough for good generalization) of pre-trained features. Theoretically, we have analyzed the close connection of TransRate to the performance after transfer learning. Despite its extraordinary simplicity in 10 lines of codes, TransRate performs remarkably well in extensive evaluations on 35 pre-trained models and 16 downstream tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/huang22d.html PDF: https://proceedings.mlr.press/v162/huang22d/huang22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-huang22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Long-Kai family: Huang - given: Junzhou family: Huang - given: Yu family: Rong - given: Qiang family: Yang - given: Ying family: Wei editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9201-9225 id: huang22d issued: date-parts: - 2022 - 6 - 28 firstpage: 9201 lastpage: 9225 published: 2022-06-28 00:00:00 +0000 - title: 'Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably)' abstract: 'Despite the remarkable success of deep multi-modal learning in practice, it has not been well-explained in theory. Recently, it has been observed that the best uni-modal network outperforms the jointly trained multi-modal network across different combinations of modalities on various tasks, which is counter-intuitive since multiple signals would bring more information (Wang et al., 2020). This work provides a theoretical explanation for the emergence of such performance gap in neural networks for the prevalent joint training framework. Based on a simplified data distribution that captures the realistic property of multi-modal data, we prove that for multi-modal late-fusion network with (smoothed) ReLU activation trained jointly by gradient descent, different modalities will compete with each other and only a subset of modalities will be learned by its corresponding encoder networks. We refer to this phenomenon as modality competition, and the losing modalities, which fail to be discovered, are the origins where the sub-optimality of joint training comes from. In contrast, for uni-modal networks with similar learning settings, we provably show that the networks will focus on learning modality-associated features. Experimentally, we illustrate that modality competition matches the intrinsic behavior of late-fusion joint training to supplement our theoretical results. To the best of our knowledge, our work is the first theoretical treatment towards the degenerating aspect of multi-modal learning in neural networks.' volume: 162 URL: https://proceedings.mlr.press/v162/huang22e.html PDF: https://proceedings.mlr.press/v162/huang22e/huang22e.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-huang22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yu family: Huang - given: Junyang family: Lin - given: Chang family: Zhou - given: Hongxia family: Yang - given: Longbo family: Huang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9226-9259 id: huang22e issued: date-parts: - 2022 - 6 - 28 firstpage: 9226 lastpage: 9259 published: 2022-06-28 00:00:00 +0000 - title: 'Action-Sufficient State Representation Learning for Control with Structural Constraints' abstract: 'Perceived signals in real-world scenarios are usually high-dimensional and noisy, and finding and using their representation that contains essential and sufficient information required by downstream decision-making tasks will help improve computational efficiency and generalization ability in the tasks. In this paper, we focus on partially observable environments and propose to learn a minimal set of state representations that capture sufficient information for decision-making, termed Action-Sufficient state Representations (ASRs). We build a generative environment model for the structural relationships among variables in the system and present a principled way to characterize ASRs based on structural constraints and the goal of maximizing cumulative reward in policy learning. We then develop a structured sequential Variational Auto-Encoder to estimate the environment model and extract ASRs. Our empirical results on CarRacing and VizDoom demonstrate a clear advantage of learning and using ASRs for policy learning. Moreover, the estimated environment model and ASRs allow learning behaviors from imagined outcomes in the compact latent space to improve sample efficiency.' volume: 162 URL: https://proceedings.mlr.press/v162/huang22f.html PDF: https://proceedings.mlr.press/v162/huang22f/huang22f.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-huang22f.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Biwei family: Huang - given: Chaochao family: Lu - given: Liu family: Leqi - given: Jose Miguel family: Hernandez-Lobato - given: Clark family: Glymour - given: Bernhard family: Schölkopf - given: Kun family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9260-9279 id: huang22f issued: date-parts: - 2022 - 6 - 28 firstpage: 9260 lastpage: 9279 published: 2022-06-28 00:00:00 +0000 - title: '3DLinker: An E(3) Equivariant Variational Autoencoder for Molecular Linker Design' abstract: 'Deep learning has achieved tremendous success in designing novel chemical compounds with desirable pharmaceutical properties. In this work, we focus on a new type of drug design problem — generating a small “linker” to physically attach two independent molecules with their distinct functions. The main computational challenges include: 1) the generation of linkers is conditional on the two given molecules, in contrast to generating complete molecules from scratch in previous works; 2) linkers heavily depend on the anchor atoms of the two molecules to be connected, which are not known beforehand; 3) 3D structures and orientations of the molecules need to be considered to avoid atom clashes, for which equivariance to E(3) group are necessary. To address these problems, we propose a conditional generative model, named 3DLinker, which is able to predict anchor atoms and jointly generate linker graphs and their 3D structures based on an E(3) equivariant graph variational autoencoder. So far as we know, no previous models could achieve this task. We compare our model with multiple conditional generative models modified from other molecular design tasks and find that our model has a significantly higher rate in recovering molecular graphs, and more importantly, accurately predicting the 3D coordinates of all the atoms.' volume: 162 URL: https://proceedings.mlr.press/v162/huang22g.html PDF: https://proceedings.mlr.press/v162/huang22g/huang22g.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-huang22g.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yinan family: Huang - given: Xingang family: Peng - given: Jianzhu family: Ma - given: Muhan family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9280-9294 id: huang22g issued: date-parts: - 2022 - 6 - 28 firstpage: 9280 lastpage: 9294 published: 2022-06-28 00:00:00 +0000 - title: 'SDQ: Stochastic Differentiable Quantization with Mixed Precision' abstract: 'In order to deploy deep models in a computationally efficient manner, model quantization approaches have been frequently used. In addition, as new hardware that supports various-bit arithmetic operations, recent research on mixed precision quantization (MPQ) begins to fully leverage the capacity of representation by searching various bitwidths for different layers and modules in a network. However, previous studies mainly search the MPQ strategy in a costly scheme using reinforcement learning, neural architecture search, etc., or simply utilize partial prior knowledge for bitwidth distribution, which might be biased and sub-optimal. In this work, we present a novel Stochastic Differentiable Quantization (SDQ) method that can automatically learn the MPQ strategy in a more flexible and globally-optimized space with a smoother gradient approximation. Particularly, Differentiable Bitwidth Parameters (DBPs) are employed as the probability factors in stochastic quantization between adjacent bitwidth. After the optimal MPQ strategy is acquired, we further train our network with the entropy-aware bin regularization and knowledge distillation. We extensively evaluate our method on different networks, hardwares (GPUs and FPGA), and datasets. SDQ outperforms all other state-of-the-art mixed or single precision quantization with less bitwidth, and are even better than the original full-precision counterparts across various ResNet and MobileNet families, demonstrating the effectiveness and superiority of our method. Code will be publicly available.' volume: 162 URL: https://proceedings.mlr.press/v162/huang22h.html PDF: https://proceedings.mlr.press/v162/huang22h/huang22h.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-huang22h.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xijie family: Huang - given: Zhiqiang family: Shen - given: Shichao family: Li - given: Zechun family: Liu - given: Hu family: Xianghong - given: Jeffry family: Wicaksana - given: Eric family: Xing - given: Kwang-Ting family: Cheng editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9295-9309 id: huang22h issued: date-parts: - 2022 - 6 - 28 firstpage: 9295 lastpage: 9309 published: 2022-06-28 00:00:00 +0000 - title: 'Tackling Data Heterogeneity: A New Unified Framework for Decentralized SGD with Sample-induced Topology' abstract: 'We develop a general framework unifying several gradient-based stochastic optimization methods for empirical risk minimization problems both in centralized and distributed scenarios. The framework hinges on the introduction of an augmented graph consisting of nodes modeling the samples and edges modeling both the inter-device communication and intra-device stochastic gradient computation. By designing properly the topology of the augmented graph, we are able to recover as special cases the renowned Local-SGD and DSGD algorithms, and provide a unified perspective for variance-reduction (VR) and gradient-tracking (GT) methods such as SAGA, Local-SVRG and GT-SAGA. We also provide a unified convergence analysis for smooth and (strongly) convex objectives relying on a proper structured Lyapunov function, and the obtained rate can recover the best known results for many existing algorithms. The rate results further reveal that VR and GT methods can effectively eliminate data heterogeneity within and across devices, respectively, enabling the exact convergence of the algorithm to the optimal solution. Numerical experiments confirm the findings in this paper.' volume: 162 URL: https://proceedings.mlr.press/v162/huang22i.html PDF: https://proceedings.mlr.press/v162/huang22i/huang22i.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-huang22i.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yan family: Huang - given: Ying family: Sun - given: Zehan family: Zhu - given: Changzhi family: Yan - given: Jinming family: Xu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9310-9345 id: huang22i issued: date-parts: - 2022 - 6 - 28 firstpage: 9310 lastpage: 9345 published: 2022-06-28 00:00:00 +0000 - title: 'Efficient Representation Learning via Adaptive Context Pooling' abstract: 'Self-attention mechanisms model long-range context by using pairwise attention between all input tokens. In doing so, they assume a fixed attention granularity defined by the individual tokens (e.g., text characters or image pixels), which may not be optimal for modeling complex dependencies at higher levels. In this paper, we propose ContextPool to address this problem by adapting the attention granularity for each token. Inspired by the success of ConvNets that are combined with pooling to capture long-range dependencies, we learn to pool neighboring features for each token before computing attention in a given attention layer. The pooling weights and support size are adaptively determined, allowing the pooled features to encode meaningful context with varying scale. We show that ContextPool makes attention models more expressive, achieving strong performance often with fewer layers and thus significantly reduced cost. Experiments validate that our ContextPool module, when plugged into transformer models, matches or surpasses state-of-the-art performance using less compute on several language and image benchmarks, outperforms recent works with learned context sizes or sparse attention patterns, and is also applicable to ConvNets for efficient feature learning.' volume: 162 URL: https://proceedings.mlr.press/v162/huang22j.html PDF: https://proceedings.mlr.press/v162/huang22j/huang22j.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-huang22j.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Chen family: Huang - given: Walter family: Talbott - given: Navdeep family: Jaitly - given: Joshua M family: Susskind editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9346-9355 id: huang22j issued: date-parts: - 2022 - 6 - 28 firstpage: 9346 lastpage: 9355 published: 2022-06-28 00:00:00 +0000 - title: 'On the Learning of Non-Autoregressive Transformers' abstract: 'Non-autoregressive Transformer (NAT) is a family of text generation models, which aims to reduce the decoding latency by predicting the whole sentences in parallel. However, such latency reduction sacrifices the ability to capture left-to-right dependencies, thereby making NAT learning very challenging. In this paper, we present theoretical and empirical analyses to reveal the challenges of NAT learning and propose a unified perspective to understand existing successes. First, we show that simply training NAT by maximizing the likelihood can lead to an approximation of marginal distributions but drops all dependencies between tokens, where the dropped information can be measured by the dataset’s conditional total correlation. Second, we formalize many previous objectives in a unified framework and show that their success can be concluded as maximizing the likelihood on a proxy distribution, leading to a reduced information loss. Empirical studies show that our perspective can explain the phenomena in NAT learning and guide the design of new training methods.' volume: 162 URL: https://proceedings.mlr.press/v162/huang22k.html PDF: https://proceedings.mlr.press/v162/huang22k/huang22k.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-huang22k.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Fei family: Huang - given: Tianhua family: Tao - given: Hao family: Zhou - given: Lei family: Li - given: Minlie family: Huang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9356-9376 id: huang22k issued: date-parts: - 2022 - 6 - 28 firstpage: 9356 lastpage: 9376 published: 2022-06-28 00:00:00 +0000 - title: 'Going Deeper into Permutation-Sensitive Graph Neural Networks' abstract: 'The invariance to permutations of the adjacency matrix, i.e., graph isomorphism, is an overarching requirement for Graph Neural Networks (GNNs). Conventionally, this prerequisite can be satisfied by the invariant operations over node permutations when aggregating messages. However, such an invariant manner may ignore the relationships among neighboring nodes, thereby hindering the expressivity of GNNs. In this work, we devise an efficient permutation-sensitive aggregation mechanism via permutation groups, capturing pairwise correlations between neighboring nodes. We prove that our approach is strictly more powerful than the 2-dimensional Weisfeiler-Lehman (2-WL) graph isomorphism test and not less powerful than the 3-WL test. Moreover, we prove that our approach achieves the linear sampling complexity. Comprehensive experiments on multiple synthetic and real-world datasets demonstrate the superiority of our model.' volume: 162 URL: https://proceedings.mlr.press/v162/huang22l.html PDF: https://proceedings.mlr.press/v162/huang22l/huang22l.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-huang22l.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhongyu family: Huang - given: Yingheng family: Wang - given: Chaozhuo family: Li - given: Huiguang family: He editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9377-9409 id: huang22l issued: date-parts: - 2022 - 6 - 28 firstpage: 9377 lastpage: 9409 published: 2022-06-28 00:00:00 +0000 - title: 'Directed Acyclic Transformer for Non-Autoregressive Machine Translation' abstract: 'Non-autoregressive Transformers (NATs) significantly reduce the decoding latency by generating all tokens in parallel. However, such independent predictions prevent NATs from capturing the dependencies between the tokens for generating multiple possible translations. In this paper, we propose Directed Acyclic Transfomer (DA-Transformer), which represents the hidden states in a Directed Acyclic Graph (DAG), where each path of the DAG corresponds to a specific translation. The whole DAG simultaneously captures multiple translations and facilitates fast predictions in a non-autoregressive fashion. Experiments on the raw training data of WMT benchmark show that DA-Transformer substantially outperforms previous NATs by about 3 BLEU on average, which is the first NAT model that achieves competitive results with autoregressive Transformers without relying on knowledge distillation.' volume: 162 URL: https://proceedings.mlr.press/v162/huang22m.html PDF: https://proceedings.mlr.press/v162/huang22m/huang22m.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-huang22m.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Fei family: Huang - given: Hao family: Zhou - given: Yang family: Liu - given: Hang family: Li - given: Minlie family: Huang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9410-9428 id: huang22m issued: date-parts: - 2022 - 6 - 28 firstpage: 9410 lastpage: 9428 published: 2022-06-28 00:00:00 +0000 - title: 'Unsupervised Ground Metric Learning Using Wasserstein Singular Vectors' abstract: 'Defining meaningful distances between samples in a dataset is a fundamental problem in machine learning. Optimal Transport (OT) lifts a distance between features (the "ground metric") to a geometrically meaningful distance between samples. However, there is usually no straightforward choice of ground metric. Supervised ground metric learning approaches exist but require labeled data. In absence of labels, only ad-hoc ground metrics remain. Unsupervised ground metric learning is thus a fundamental problem to enable data-driven applications of OT. In this paper, we propose for the first time a canonical answer by simultaneously computing an OT distance between samples and between features of a dataset. These distance matrices emerge naturally as positive singular vectors of the function mapping ground metrics to OT distances. We provide criteria to ensure the existence and uniqueness of these singular vectors. We then introduce scalable computational methods to approximate them in high-dimensional settings, using stochastic approximation and entropic regularization. Finally, we showcase Wasserstein Singular Vectors on a single-cell RNA-sequencing dataset.' volume: 162 URL: https://proceedings.mlr.press/v162/huizing22a.html PDF: https://proceedings.mlr.press/v162/huizing22a/huizing22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-huizing22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Geert-Jan family: Huizing - given: Laura family: Cantini - given: Gabriel family: Peyré editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9429-9443 id: huizing22a issued: date-parts: - 2022 - 6 - 28 firstpage: 9429 lastpage: 9443 published: 2022-06-28 00:00:00 +0000 - title: 'Robust Kernel Density Estimation with Median-of-Means principle' abstract: 'In this paper, we introduce a robust non-parametric density estimator combining the popular Kernel Density Estimation method and the Median-of-Means principle (MoM-KDE). This estimator is shown to achieve robustness for a large class of anomalous data, potentially adversarial. In particular, while previous works only prove consistency results under very specific contamination models, this work provides finite-sample high-probability error-bounds without any prior knowledge on the outliers. To highlight the robustness of our method, we introduce an influence function adapted to the considered OUI framework. Finally, we show that MoM-KDE achieves competitive results when compared with other robust kernel estimators, while having significantly lower computational complexity.' volume: 162 URL: https://proceedings.mlr.press/v162/humbert22a.html PDF: https://proceedings.mlr.press/v162/humbert22a/humbert22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-humbert22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Pierre family: Humbert - given: Batiste Le family: Bars - given: Ludovic family: Minvielle editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9444-9465 id: humbert22a issued: date-parts: - 2022 - 6 - 28 firstpage: 9444 lastpage: 9465 published: 2022-06-28 00:00:00 +0000 - title: 'A data-driven approach for learning to control computers' abstract: 'It would be useful for machines to use computers as humans do so that they can aid us in everyday tasks. This is a setting in which there is also the potential to leverage large-scale expert demonstrations and human judgements of interactive behaviour, which are two ingredients that have driven much recent success in AI. Here we investigate the setting of computer control using keyboard and mouse, with goals specified via natural language. Instead of focusing on hand-designed curricula and specialized action spaces, we focus on developing a scalable method centered on reinforcement learning combined with behavioural priors informed by actual human-computer interactions. We achieve state-of-the-art and human-level mean performance across all tasks within the MiniWob++ benchmark, a challenging suite of computer control problems, and find strong evidence of cross-task transfer. These results demonstrate the usefulness of a unified human-agent interface when training machines to use computers. Altogether our results suggest a formula for achieving competency beyond MiniWob++ and towards controlling computers, in general, as a human would.' volume: 162 URL: https://proceedings.mlr.press/v162/humphreys22a.html PDF: https://proceedings.mlr.press/v162/humphreys22a/humphreys22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-humphreys22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Peter C family: Humphreys - given: David family: Raposo - given: Tobias family: Pohlen - given: Gregory family: Thornton - given: Rachita family: Chhaparia - given: Alistair family: Muldal - given: Josh family: Abramson - given: Petko family: Georgiev - given: Adam family: Santoro - given: Timothy family: Lillicrap editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9466-9482 id: humphreys22a issued: date-parts: - 2022 - 6 - 28 firstpage: 9466 lastpage: 9482 published: 2022-06-28 00:00:00 +0000 - title: 'Proximal Denoiser for Convergent Plug-and-Play Optimization with Nonconvex Regularization' abstract: 'Plug-and-Play (PnP) methods solve ill-posed inverse problems through iterative proximal algorithms by replacing a proximal operator by a denoising operation. When applied with deep neural network denoisers, these methods have shown state-of-the-art visual performance for image restoration problems. However, their theoretical convergence analysis is still incomplete. Most of the existing convergence results consider nonexpansive denoisers, which is non-realistic, or limit their analysis to strongly convex data-fidelity terms in the inverse problem to solve. Recently, it was proposed to train the denoiser as a gradient descent step on a functional parameterized by a deep neural network. Using such a denoiser guarantees the convergence of the PnP version of the Half-Quadratic-Splitting (PnP-HQS) iterative algorithm. In this paper, we show that this gradient denoiser can actually correspond to the proximal operator of another scalar function. Given this new result, we exploit the convergence theory of proximal algorithms in the nonconvex setting to obtain convergence results for PnP-PGD (Proximal Gradient Descent) and PnP-ADMM (Alternating Direction Method of Multipliers). When built on top of a smooth gradient denoiser, we show that PnP-PGD and PnP-ADMM are convergent and target stationary points of an explicit functional. These convergence results are confirmed with numerical experiments on deblurring, super-resolution and inpainting.' volume: 162 URL: https://proceedings.mlr.press/v162/hurault22a.html PDF: https://proceedings.mlr.press/v162/hurault22a/hurault22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-hurault22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Samuel family: Hurault - given: Arthur family: Leclaire - given: Nicolas family: Papadakis editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9483-9505 id: hurault22a issued: date-parts: - 2022 - 6 - 28 firstpage: 9483 lastpage: 9505 published: 2022-06-28 00:00:00 +0000 - title: 'Inverse Contextual Bandits: Learning How Behavior Evolves over Time' abstract: 'Understanding a decision-maker’s priorities by observing their behavior is critical for transparency and accountability in decision processes{—}such as in healthcare. Though conventional approaches to policy learning almost invariably assume stationarity in behavior, this is hardly true in practice: Medical practice is constantly evolving as clinical professionals fine-tune their knowledge over time. For instance, as the medical community’s understanding of organ transplantations has progressed over the years, a pertinent question is: How have actual organ allocation policies been evolving? To give an answer, we desire a policy learning method that provides interpretable representations of decision-making, in particular capturing an agent’s non-stationary knowledge of the world, as well as operating in an offline manner. First, we model the evolving behavior of decision-makers in terms of contextual bandits, and formalize the problem of Inverse Contextual Bandits ("ICB"). Second, we propose two concrete algorithms as solutions, learning parametric and non-parametric representations of an agent’s behavior. Finally, using both real and simulated data for liver transplantations, we illustrate the applicability and explainability of our method, as well as benchmarking and validating the accuracy of our algorithms.' volume: 162 URL: https://proceedings.mlr.press/v162/huyuk22a.html PDF: https://proceedings.mlr.press/v162/huyuk22a/huyuk22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-huyuk22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Alihan family: Hüyük - given: Daniel family: Jarrett - given: Mihaela prefix: van der family: Schaar editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9506-9524 id: huyuk22a issued: date-parts: - 2022 - 6 - 28 firstpage: 9506 lastpage: 9524 published: 2022-06-28 00:00:00 +0000 - title: 'Datamodels: Understanding Predictions with Data and Data with Predictions' abstract: 'We present a conceptual framework, datamodeling, for analyzing the behavior of a model class in terms of the training data. For any fixed “target” example $x$, training set $S$, and learning algorithm, a datamodel is a parameterized function $2^S \to \mathbb{R}$ that for any subset of $S’ \subset S$—using only information about which examples of $S$ are contained in $S’$—predicts the outcome of training a model on $S’$ and evaluating on $x$. Despite the complexity of the underlying process being approximated (e.g. end-to-end training and evaluation of deep neural networks), we show that even simple linear datamodels successfully predict model outputs. We then demonstrate that datamodels give rise to a variety of applications, such as: accurately predicting the effect of dataset counterfactuals; identifying brittle predictions; finding semantically similar examples; quantifying train-test leakage; and embedding data into a well-behaved and feature-rich representation space.' volume: 162 URL: https://proceedings.mlr.press/v162/ilyas22a.html PDF: https://proceedings.mlr.press/v162/ilyas22a/ilyas22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ilyas22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Andrew family: Ilyas - given: Sung Min family: Park - given: Logan family: Engstrom - given: Guillaume family: Leclerc - given: Aleksander family: Madry editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9525-9587 id: ilyas22a issued: date-parts: - 2022 - 6 - 28 firstpage: 9525 lastpage: 9587 published: 2022-06-28 00:00:00 +0000 - title: 'Parsimonious Learning-Augmented Caching' abstract: 'Learning-augmented algorithms—in which, traditional algorithms are augmented with machine-learned predictions—have emerged as a framework to go beyond worst-case analysis. The overarching goal is to design algorithms that perform near-optimally when the predictions are accurate yet retain certain worst-case guarantees irrespective of the accuracy of the predictions. This framework has been successfully applied to online problems such as caching where the predictions can be used to alleviate uncertainties. In this paper we introduce and study the setting in which the learning-augmented algorithm can utilize the predictions parsimoniously. We consider the caching problem—which has been extensively studied in the learning-augmented setting—and show that one can achieve quantitatively similar results but only using a sublinear number of predictions.' volume: 162 URL: https://proceedings.mlr.press/v162/im22a.html PDF: https://proceedings.mlr.press/v162/im22a/im22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-im22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sungjin family: Im - given: Ravi family: Kumar - given: Aditya family: Petety - given: Manish family: Purohit editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9588-9601 id: im22a issued: date-parts: - 2022 - 6 - 28 firstpage: 9588 lastpage: 9601 published: 2022-06-28 00:00:00 +0000 - title: 'Bayesian Optimization for Distributionally Robust Chance-constrained Problem' abstract: 'In black-box function optimization, we need to consider not only controllable design variables but also uncontrollable stochastic environment variables. In such cases, it is necessary to solve the optimization problem by taking into account the uncertainty of the environmental variables. Chance-constrained (CC) problem, the problem of maximizing the expected value under a certain level of constraint satisfaction probability, is one of the practically important problems in the presence of environmental variables. In this study, we consider distributionally robust CC (DRCC) problem and propose a novel DRCC Bayesian optimization method for the case where the distribution of the environmental variables cannot be precisely specified. We show that the proposed method can find an arbitrary accurate solution with high probability in a finite number of trials, and confirm the usefulness of the proposed method through numerical experiments.' volume: 162 URL: https://proceedings.mlr.press/v162/inatsu22a.html PDF: https://proceedings.mlr.press/v162/inatsu22a/inatsu22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-inatsu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yu family: Inatsu - given: Shion family: Takeno - given: Masayuki family: Karasuyama - given: Ichiro family: Takeuchi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9602-9621 id: inatsu22a issued: date-parts: - 2022 - 6 - 28 firstpage: 9602 lastpage: 9621 published: 2022-06-28 00:00:00 +0000 - title: 'LeNSE: Learning To Navigate Subgraph Embeddings for Large-Scale Combinatorial Optimisation' abstract: 'Combinatorial Optimisation problems arise in several application domains and are often formulated in terms of graphs. Many of these problems are NP-hard, but exact solutions are not always needed. Several heuristics have been developed to provide near-optimal solutions; however, they do not typically scale well with the size of the graph. We propose a low-complexity approach for identifying a (possibly much smaller) subgraph of the original graph where the heuristics can be run in reasonable time and with a high likelihood of finding a global near-optimal solution. The core component of our approach is LeNSE, a reinforcement learning algorithm that learns how to navigate the space of possible subgraphs using an Euclidean subgraph embedding as its map. To solve CO problems, LeNSE is provided with a discriminative embedding trained using any existing heuristics using only on a small portion of the original graph. When tested on three problems (vertex cover, max-cut and influence maximisation) using real graphs with up to $10$ million edges, LeNSE identifies small subgraphs yielding solutions comparable to those found by running the heuristics on the entire graph, but at a fraction of the total run time. Code for the experiments is available in the public GitHub repo at https://github.com/davidireland3/LeNSE.' volume: 162 URL: https://proceedings.mlr.press/v162/ireland22a.html PDF: https://proceedings.mlr.press/v162/ireland22a/ireland22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ireland22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: David family: Ireland - given: Giovanni family: Montana editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9622-9638 id: ireland22a issued: date-parts: - 2022 - 6 - 28 firstpage: 9622 lastpage: 9638 published: 2022-06-28 00:00:00 +0000 - title: 'The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention' abstract: 'Linear layers in neural networks (NNs) trained by gradient descent can be expressed as a key-value memory system which stores all training datapoints and the initial weights, and produces outputs using unnormalised dot attention over the entire training experience. While this has been technically known since the 1960s, no prior work has effectively studied the operations of NNs in such a form, presumably due to prohibitive time and space complexities and impractical model sizes, all of them growing linearly with the number of training patterns which may get very large. However, this dual formulation offers a possibility of directly visualising how an NN makes use of training patterns at test time, by examining the corresponding attention weights. We conduct experiments on small scale supervised image classification tasks in single-task, multi-task, and continual learning settings, as well as language modelling, and discuss potentials and limits of this view for better understanding and interpreting how NNs exploit training patterns. Our code is public.' volume: 162 URL: https://proceedings.mlr.press/v162/irie22a.html PDF: https://proceedings.mlr.press/v162/irie22a/irie22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-irie22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Kazuki family: Irie - given: Róbert family: Csordás - given: Jürgen family: Schmidhuber editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9639-9659 id: irie22a issued: date-parts: - 2022 - 6 - 28 firstpage: 9639 lastpage: 9659 published: 2022-06-28 00:00:00 +0000 - title: 'A Modern Self-Referential Weight Matrix That Learns to Modify Itself' abstract: 'The weight matrix (WM) of a neural network (NN) is its program. The programs of many traditional NNs are learned through gradient descent in some error function, then remain fixed. The WM of a self-referential NN, however, can keep rapidly modifying all of itself during runtime. In principle, such NNs can meta-learn to learn, and meta-meta-learn to meta-learn to learn, and so on, in the sense of recursive self-improvement. While NN architectures potentially capable of implementing such behaviour have been proposed since the ’90s, there have been few if any practical studies. Here we revisit such NNs, building upon recent successes of fast weight programmers and closely related linear Transformers. We propose a scalable self-referential WM (SRWM) that learns to use outer products and the delta update rule to modify itself. We evaluate our SRWM in supervised few-shot learning and in multi-task reinforcement learning with procedurally generated game environments. Our experiments demonstrate both practical applicability and competitive performance of the proposed SRWM. Our code is public.' volume: 162 URL: https://proceedings.mlr.press/v162/irie22b.html PDF: https://proceedings.mlr.press/v162/irie22b/irie22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-irie22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Kazuki family: Irie - given: Imanol family: Schlag - given: Róbert family: Csordás - given: Jürgen family: Schmidhuber editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9660-9677 id: irie22b issued: date-parts: - 2022 - 6 - 28 firstpage: 9660 lastpage: 9677 published: 2022-06-28 00:00:00 +0000 - title: 'Revisiting Online Submodular Minimization: Gap-Dependent Regret Bounds, Best of Both Worlds and Adversarial Robustness' abstract: 'In this paper, we consider online decision problems with submodular loss functions. For such problems, existing studies have only dealt with worst-case analysis. This study goes beyond worst-case analysis to show instance-dependent regret bounds. More precisely, for each of the full-information and bandit-feedback settings, we propose an algorithm that achieves a gap-dependent O(log T)-regret bound in the stochastic environment and is comparable to the best existing algorithm in the adversarial environment. The proposed algorithms also work well in the stochastic environment with adversarial corruptions, which is an intermediate setting between the stochastic and adversarial environments.' volume: 162 URL: https://proceedings.mlr.press/v162/ito22a.html PDF: https://proceedings.mlr.press/v162/ito22a/ito22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ito22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Shinji family: Ito editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9678-9694 id: ito22a issued: date-parts: - 2022 - 6 - 28 firstpage: 9678 lastpage: 9694 published: 2022-06-28 00:00:00 +0000 - title: 'Modeling Strong and Human-Like Gameplay with KL-Regularized Search' abstract: 'We consider the task of accurately modeling strong human policies in multi-agent decision-making problems, given examples of human behavior. Imitation learning is effective at predicting human actions but may not match the strength of expert humans (e.g., by sometimes committing blunders), while self-play learning and search techniques such as AlphaZero lead to strong performance but may produce policies that differ markedly from human behavior. In chess and Go, we show that regularized search algorithms that penalize KL divergence from an imitation-learned policy yield higher prediction accuracy of strong humans and better performance than imitation learning alone. We then introduce a novel regret minimization algorithm that is regularized based on the KL divergence from an imitation-learned policy, and show that using this algorithm for search in no-press Diplomacy yields a policy that matches the human prediction accuracy of imitation learning while being substantially stronger.' volume: 162 URL: https://proceedings.mlr.press/v162/jacob22a.html PDF: https://proceedings.mlr.press/v162/jacob22a/jacob22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jacob22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Athul Paul family: Jacob - given: David J family: Wu - given: Gabriele family: Farina - given: Adam family: Lerer - given: Hengyuan family: Hu - given: Anton family: Bakhtin - given: Jacob family: Andreas - given: Noam family: Brown editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9695-9728 id: jacob22a issued: date-parts: - 2022 - 6 - 28 firstpage: 9695 lastpage: 9728 published: 2022-06-28 00:00:00 +0000 - title: 'A deep convolutional neural network that is invariant to time rescaling' abstract: 'Human learners can readily understand speech, or a melody, when it is presented slower or faster than usual. This paper presents a deep CNN (SITHCon) that uses a logarithmically compressed temporal representation at each level. Because rescaling the time of the input results in a translation of $\log$ time, and because the output of the convolution is invariant to translations, this network can generalize to out-of-sample data that are temporal rescalings of a learned pattern. We compare the performance of SITHCon to a Temporal Convolution Network (TCN) on classification and regression problems with both univariate and multivariate time series. We find that SITHCon, unlike TCN, generalizes robustly over rescalings of about an order of magnitude. Moreover, we show that the network can generalize over exponentially large scales without retraining the weights simply by extending the range of the logarithmically-compressed temporal memory.' volume: 162 URL: https://proceedings.mlr.press/v162/jacques22a.html PDF: https://proceedings.mlr.press/v162/jacques22a/jacques22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jacques22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Brandon G family: Jacques - given: Zoran family: Tiganj - given: Aakash family: Sarkar - given: Marc family: Howard - given: Per family: Sederberg editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9729-9738 id: jacques22a issued: date-parts: - 2022 - 6 - 28 firstpage: 9729 lastpage: 9738 published: 2022-06-28 00:00:00 +0000 - title: 'Input Dependent Sparse Gaussian Processes' abstract: 'Gaussian Processes (GPs) are non-parametric models that provide accurate uncertainty estimates. Nevertheless, they have a cubic cost in the number of data instances $N$. To overcome this, sparse GP approximations are used, in which a set of $M \ll N$ inducing points is introduced. The location of the inducing points is learned by considering them parameters of an approximate posterior distribution $q$. Sparse GPs, combined with stochastic variational inference for inferring $q$ have a cost per iteration in $\mathcal{O}(M^3)$. Critically, the inducing points determine the flexibility of the model and they are often located in regions where the latent function changes. A limitation is, however, that in some tasks a large number of inducing points may be required to obtain good results. To alleviate this, we propose here to amortize the computation of the inducing points locations, as well as the parameters of $q$. For this, we use a neural network that receives a data instance as an input and outputs the corresponding inducing points locations and the parameters of $q$. We evaluate our method in several experiments, showing that it performs similar or better than other state-of-the-art sparse variational GPs. However, in our method the number of inducing points is reduced drastically since they depend on the input data. This makes our method scale to larger datasets and have faster training and prediction times.' volume: 162 URL: https://proceedings.mlr.press/v162/jafrasteh22a.html PDF: https://proceedings.mlr.press/v162/jafrasteh22a/jafrasteh22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jafrasteh22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Bahram family: Jafrasteh - given: Carlos family: Villacampa-Calvo - given: Daniel family: Hernandez-Lobato editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9739-9759 id: jafrasteh22a issued: date-parts: - 2022 - 6 - 28 firstpage: 9739 lastpage: 9759 published: 2022-06-28 00:00:00 +0000 - title: 'Regret Minimization with Performative Feedback' abstract: 'In performative prediction, the deployment of a predictive model triggers a shift in the data distribution. As these shifts are typically unknown ahead of time, the learner needs to deploy a model to get feedback about the distribution it induces. We study the problem of finding near-optimal models under performativity while maintaining low regret. On the surface, this problem might seem equivalent to a bandit problem. However, it exhibits a fundamentally richer feedback structure that we refer to as performative feedback: after every deployment, the learner receives samples from the shifted distribution rather than bandit feedback about the reward. Our main contribution is regret bounds that scale only with the complexity of the distribution shifts and not that of the reward function. The key algorithmic idea is careful exploration of the distribution shifts that informs a novel construction of confidence bounds on the risk of unexplored models. The construction only relies on smoothness of the shifts and does not assume convexity. More broadly, our work establishes a conceptual approach for leveraging tools from the bandits literature for the purpose of regret minimization with performative feedback.' volume: 162 URL: https://proceedings.mlr.press/v162/jagadeesan22a.html PDF: https://proceedings.mlr.press/v162/jagadeesan22a/jagadeesan22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jagadeesan22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Meena family: Jagadeesan - given: Tijana family: Zrnic - given: Celestine family: Mendler-Dünner editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9760-9785 id: jagadeesan22a issued: date-parts: - 2022 - 6 - 28 firstpage: 9760 lastpage: 9785 published: 2022-06-28 00:00:00 +0000 - title: 'Biological Sequence Design with GFlowNets' abstract: 'Design of de novo biological sequences with desired properties, like protein and DNA sequences, often involves an active loop with several rounds of molecule ideation and expensive wet-lab evaluations. These experiments can consist of multiple stages, with increasing levels of precision and cost of evaluation, where candidates are filtered. This makes the diversity of proposed candidates a key consideration in the ideation phase. In this work, we propose an active learning algorithm leveraging epistemic uncertainty estimation and the recently proposed GFlowNets as a generator of diverse candidate solutions, with the objective to obtain a diverse batch of useful (as defined by some utility function, for example, the predicted anti-microbial activity of a peptide) and informative candidates after each round. We also propose a scheme to incorporate existing labeled datasets of candidates, in addition to a reward function, to speed up learning in GFlowNets. We present empirical results on several biological sequence design tasks, and we find that our method generates more diverse and novel batches with high scoring candidates compared to existing approaches.' volume: 162 URL: https://proceedings.mlr.press/v162/jain22a.html PDF: https://proceedings.mlr.press/v162/jain22a/jain22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jain22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Moksh family: Jain - given: Emmanuel family: Bengio - given: Alex family: Hernandez-Garcia - given: Jarrid family: Rector-Brooks - given: Bonaventure F. P. family: Dossou - given: Chanakya Ajit family: Ekbote - given: Jie family: Fu - given: Tianyu family: Zhang - given: Michael family: Kilgour - given: Dinghuai family: Zhang - given: Lena family: Simine - given: Payel family: Das - given: Yoshua family: Bengio editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9786-9801 id: jain22a issued: date-parts: - 2022 - 6 - 28 firstpage: 9786 lastpage: 9801 published: 2022-06-28 00:00:00 +0000 - title: 'Combining Diverse Feature Priors' abstract: 'To improve model generalization, model designers often restrict the features that their models use, either implicitly or explicitly. In this work, we explore the design space of leveraging such feature priors by viewing them as distinct perspectives on the data. Specifically, we find that models trained with diverse sets of explicit feature priors have less overlapping failure modes, and can thus be combined more effectively. Moreover, we demonstrate that jointly training such models on additional (unlabeled) data allows them to correct each other’s mistakes, which, in turn, leads to better generalization and resilience to spurious correlations.' volume: 162 URL: https://proceedings.mlr.press/v162/jain22b.html PDF: https://proceedings.mlr.press/v162/jain22b/jain22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jain22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Saachi family: Jain - given: Dimitris family: Tsipras - given: Aleksander family: Madry editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9802-9832 id: jain22b issued: date-parts: - 2022 - 6 - 28 firstpage: 9802 lastpage: 9832 published: 2022-06-28 00:00:00 +0000 - title: 'Training Your Sparse Neural Network Better with Any Mask' abstract: 'Pruning large neural networks to create high-quality, independently trainable sparse masks, which can maintain similar performance to their dense counterparts, is very desirable due to the reduced space and time complexity. As research effort is focused on increasingly sophisticated pruning methods that leads to sparse subnetworks trainable from the scratch, we argue for an orthogonal, under-explored theme: improving training techniques for pruned sub-networks, i.e. sparse training. Apart from the popular belief that only the quality of sparse masks matters for sparse training, in this paper we demonstrate an alternative opportunity: one can carefully customize the sparse training techniques to deviate from the default dense network training protocols, consisting of introducing “ghost" neurons and skip connections at the early stage of training, and strategically modifying the initialization as well as labels. Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks. By adopting our newly curated techniques, we demonstrate significant performance gains across various popular datasets (CIFAR-10, CIFAR-100, TinyImageNet), architectures (ResNet-18/32/104, Vgg16, MobileNet), and sparse mask options (lottery ticket, SNIP/GRASP, SynFlow, or even randomly pruning), compared to the default training protocols, especially at high sparsity levels. Codes will be publicly available.' volume: 162 URL: https://proceedings.mlr.press/v162/jaiswal22a.html PDF: https://proceedings.mlr.press/v162/jaiswal22a/jaiswal22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jaiswal22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ajay Kumar family: Jaiswal - given: Haoyu family: Ma - given: Tianlong family: Chen - given: Ying family: Ding - given: Zhangyang family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9833-9844 id: jaiswal22a issued: date-parts: - 2022 - 6 - 28 firstpage: 9833 lastpage: 9844 published: 2022-06-28 00:00:00 +0000 - title: 'Sequential Covariate Shift Detection Using Classifier Two-Sample Tests' abstract: 'A standard assumption in supervised learning is that the training data and test data are from the same distribution. However, this assumption often fails to hold in practice, which can cause the learned model to perform poorly. We consider the problem of detecting covariate shift, where the covariate distribution shifts but the conditional distribution of labels given covariates remains the same. This problem can naturally be solved using a two-sample test{—}i.e., test whether the current test distribution of covariates equals the training distribution of covariates. Our algorithm builds on classifier tests, which train a discriminator to distinguish train and test covariates, and then use the accuracy of this discriminator as a test statistic. A key challenge is that classifier tests assume given a fixed set of test covariates. In practice, test covariates often arrive sequentially over time{—}e.g., a self-driving car observes a stream of images while driving. Furthermore, covariate shift can occur multiple times{—}i.e., shift and then shift back later or gradually shift over time. To address these challenges, our algorithm trains the discriminator online. Additionally, it evaluates test accuracy using each new covariate before taking a gradient step; this strategy avoids constructing a held-out test set, which can improve sample efficiency. We prove that this optimization preserves the correctness{—}i.e., our algorithm achieves a desired bound on the false positive rate. In our experiments, we show that our algorithm efficiently detects covariate shifts on multiple datasets{—}ImageNet, IWildCam, and Py150.' volume: 162 URL: https://proceedings.mlr.press/v162/jang22a.html PDF: https://proceedings.mlr.press/v162/jang22a/jang22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jang22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sooyong family: Jang - given: Sangdon family: Park - given: Insup family: Lee - given: Osbert family: Bastani editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9845-9880 id: jang22a issued: date-parts: - 2022 - 6 - 28 firstpage: 9845 lastpage: 9880 published: 2022-06-28 00:00:00 +0000 - title: 'Surrogate Likelihoods for Variational Annealed Importance Sampling' abstract: 'Variational inference is a powerful paradigm for approximate Bayesian inference with a number of appealing properties, including support for model learning and data subsampling. By contrast MCMC methods like Hamiltonian Monte Carlo do not share these properties but remain attractive since, contrary to parametric methods, MCMC is asymptotically unbiased. For these reasons researchers have sought to combine the strengths of both classes of algorithms, with recent approaches coming closer to realizing this vision in practice. However, supporting data subsampling in these hybrid methods can be a challenge, a shortcoming that we address by introducing a surrogate likelihood that can be learned jointly with other variational parameters. We argue theoretically that the resulting algorithm allows an intuitive trade-off between inference fidelity and computational cost. In an extensive empirical comparison we show that our method performs well in practice and that it is well-suited for black-box inference in probabilistic programming frameworks.' volume: 162 URL: https://proceedings.mlr.press/v162/jankowiak22a.html PDF: https://proceedings.mlr.press/v162/jankowiak22a/jankowiak22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jankowiak22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Martin family: Jankowiak - given: Du family: Phan editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9881-9901 id: jankowiak22a issued: date-parts: - 2022 - 6 - 28 firstpage: 9881 lastpage: 9901 published: 2022-06-28 00:00:00 +0000 - title: 'Planning with Diffusion for Flexible Behavior Synthesis' abstract: 'Model-based reinforcement learning methods often use learning only for the purpose of recovering an approximate dynamics model, offloading the rest of the decision-making work to classical trajectory optimizers. While conceptually simple, this combination has a number of empirical shortcomings, suggesting that learned models may not be well-suited to standard trajectory optimization. In this paper, we consider what it would look like to fold as much of the trajectory optimization pipeline as possible into the modeling problem, such that sampling from the model and planning with it become nearly identical. The core of our technical approach lies in a diffusion probabilistic model that plans by iteratively denoising trajectories. We show how classifier-guided sampling and image inpainting can be reinterpreted as coherent planning strategies, explore the unusual and useful properties of diffusion-based planning methods, and demonstrate the effectiveness of our framework in control settings that emphasize long-horizon decision-making and test-time flexibility.' volume: 162 URL: https://proceedings.mlr.press/v162/janner22a.html PDF: https://proceedings.mlr.press/v162/janner22a/janner22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-janner22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Michael family: Janner - given: Yilun family: Du - given: Joshua family: Tenenbaum - given: Sergey family: Levine editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9902-9915 id: janner22a issued: date-parts: - 2022 - 6 - 28 firstpage: 9902 lastpage: 9915 published: 2022-06-28 00:00:00 +0000 - title: 'HyperImpute: Generalized Iterative Imputation with Automatic Model Selection' abstract: 'Consider the problem of imputing missing values in a dataset. One the one hand, conventional approaches using iterative imputation benefit from the simplicity and customizability of learning conditional distributions directly, but suffer from the practical requirement for appropriate model specification of each and every variable. On the other hand, recent methods using deep generative modeling benefit from the capacity and efficiency of learning with neural network function approximators, but are often difficult to optimize and rely on stronger data assumptions. In this work, we study an approach that marries the advantages of both: We propose *HyperImpute*, a generalized iterative imputation framework for adaptively and automatically configuring column-wise models and their hyperparameters. Practically, we provide a concrete implementation with out-of-the-box learners, optimizers, simulators, and extensible interfaces. Empirically, we investigate this framework via comprehensive experiments and sensitivities on a variety of public datasets, and demonstrate its ability to generate accurate imputations relative to a strong suite of benchmarks. Contrary to recent work, we believe our findings constitute a strong defense of the iterative imputation paradigm.' volume: 162 URL: https://proceedings.mlr.press/v162/jarrett22a.html PDF: https://proceedings.mlr.press/v162/jarrett22a/jarrett22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jarrett22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Daniel family: Jarrett - given: Bogdan C family: Cebere - given: Tennison family: Liu - given: Alicia family: Curth - given: Mihaela prefix: van der family: Schaar editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9916-9937 id: jarrett22a issued: date-parts: - 2022 - 6 - 28 firstpage: 9916 lastpage: 9937 published: 2022-06-28 00:00:00 +0000 - title: 'Mitigating Modality Collapse in Multimodal VAEs via Impartial Optimization' abstract: 'A number of variational autoencoders (VAEs) have recently emerged with the aim of modeling multimodal data, e.g., to jointly model images and their corresponding captions. Still, multimodal VAEs tend to focus solely on a subset of the modalities, e.g., by fitting the image while neglecting the caption. We refer to this limitation as modality collapse. In this work, we argue that this effect is a consequence of conflicting gradients during multimodal VAE training. We show how to detect the sub-graphs in the computational graphs where gradients conflict (impartiality blocks), as well as how to leverage existing gradient-conflict solutions from multitask learning to mitigate modality collapse. That is, to ensure impartial optimization across modalities. We apply our training framework to several multimodal VAE models, losses and datasets from the literature, and empirically show that our framework significantly improves the reconstruction performance, conditional generation, and coherence of the latent space across modalities.' volume: 162 URL: https://proceedings.mlr.press/v162/javaloy22a.html PDF: https://proceedings.mlr.press/v162/javaloy22a/javaloy22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-javaloy22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Adrian family: Javaloy - given: Maryam family: Meghdadi - given: Isabel family: Valera editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9938-9964 id: javaloy22a issued: date-parts: - 2022 - 6 - 28 firstpage: 9938 lastpage: 9964 published: 2022-06-28 00:00:00 +0000 - title: 'Towards understanding how momentum improves generalization in deep learning' abstract: 'Stochastic gradient descent (SGD) with momentum is widely used for training modern deep learning architectures. While it is well-understood that using momentum can lead to faster convergence rate in various settings, it has also been observed that momentum yields higher generalization. Prior work argue that momentum stabilizes the SGD noise during training and this leads to higher generalization. In this paper, we adopt another perspective and first empirically show that gradient descent with momentum (GD+M) significantly improves generalization compared to gradient descent (GD) in some deep learning problems. From this observation, we formally study how momentum improves generalization. We devise a binary classification setting where a one-hidden layer (over-parameterized) convolutional neural network trained with GD+M provably generalizes better than the same network trained with GD, when both algorithms are similarly initialized. The key insight in our analysis is that momentum is beneficial in datasets where the examples share some feature but differ in their margin. Contrary to GD that memorizes the small margin data, GD+M still learns the feature in these data thanks to its historical gradients. Lastly, we empirically validate our theoretical findings.' volume: 162 URL: https://proceedings.mlr.press/v162/jelassi22a.html PDF: https://proceedings.mlr.press/v162/jelassi22a/jelassi22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jelassi22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Samy family: Jelassi - given: Yuanzhi family: Li editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 9965-10040 id: jelassi22a issued: date-parts: - 2022 - 6 - 28 firstpage: 9965 lastpage: 10040 published: 2022-06-28 00:00:00 +0000 - title: 'MASER: Multi-Agent Reinforcement Learning with Subgoals Generated from Experience Replay Buffer' abstract: 'In this paper, we consider cooperative multi-agent reinforcement learning (MARL) with sparse reward. To tackle this problem, we propose a novel method named MASER: MARL with subgoals generated from experience replay buffer. Under the widely-used assumption of centralized training with decentralized execution and consistent Q-value decomposition for MARL, MASER automatically generates proper subgoals for multiple agents from the experience replay buffer by considering both individual Q-value and total Q-value. Then, MASER designs individual intrinsic reward for each agent based on actionable representation relevant to Q-learning so that the agents reach their subgoals while maximizing the joint action value. Numerical results show that MASER significantly outperforms StarCraft II micromanagement benchmark compared to other state-of-the-art MARL algorithms.' volume: 162 URL: https://proceedings.mlr.press/v162/jeon22a.html PDF: https://proceedings.mlr.press/v162/jeon22a/jeon22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jeon22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jeewon family: Jeon - given: Woojun family: Kim - given: Whiyoung family: Jung - given: Youngchul family: Sung editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10041-10052 id: jeon22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10041 lastpage: 10052 published: 2022-06-28 00:00:00 +0000 - title: 'An Exact Symbolic Reduction of Linear Smart Predict+Optimize to Mixed Integer Linear Programming' abstract: 'Predictive models are traditionally optimized independently of their use in downstream decision-based optimization. The ‘smart, predict then optimize’ (SPO) framework addresses this shortcoming by optimizing predictive models in order to minimize the final downstream decision loss. To date, several local first-order methods and convex approximations have been proposed. These methods have proven to be effective in practice, however, it remains generally unclear as to how close these local solutions are to global optimality. In this paper, we cast the SPO problem as a bi-level program and apply Symbolic Variable Elimination (SVE) to analytically solve the lower optimization. The resulting program can then be formulated as a mixed-integer linear program (MILP) which is solved to global optimality using standard off-the-shelf solvers. To our knowledge, our framework is the first to provide a globally optimal solution to the linear SPO problem. Experimental results comparing with state-of-the-art local SPO solvers show that the globally optimal solution obtains up to two orders of magnitude reduction in decision regret.' volume: 162 URL: https://proceedings.mlr.press/v162/jeong22a.html PDF: https://proceedings.mlr.press/v162/jeong22a/jeong22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jeong22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jihwan family: Jeong - given: Parth family: Jaggi - given: Andrew family: Butler - given: Scott family: Sanner editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10053-10067 id: jeong22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10053 lastpage: 10067 published: 2022-06-28 00:00:00 +0000 - title: 'Agnostic Learnability of Halfspaces via Logistic Loss' abstract: 'We investigate approximation guarantees provided by logistic regression for the fundamental problem of agnostic learning of homogeneous halfspaces. Previously, for a certain broad class of “well-behaved” distributions on the examples, Diakonikolas et al. (2020) proved an tilde{Omega}(OPT) lower bound, while Frei et al. (2021) proved an tilde{O}(sqrt{OPT}) upper bound, where OPT denotes the best zero-one/misclassification risk of a homogeneous halfspace. In this paper, we close this gap by constructing a well-behaved distribution such that the global minimizer of the logistic risk over this distribution only achieves Omega(sqrt{OPT}) misclassification risk, matching the upper bound in (Frei et al., 2021). On the other hand, we also show that if we impose a radial-Lipschitzness condition in addition to well-behaved-ness on the distribution, logistic regression on a ball of bounded radius reaches tilde{O}(OPT) misclassification risk. Our techniques also show for any well-behaved distribution, regardless of radial Lipschitzness, we can overcome the Omega(sqrt{OPT}) lower bound for logistic loss simply at the cost of one additional convex optimization step involving the hinge loss and attain tilde{O}(OPT) misclassification risk. This two-step convex optimization algorithm is simpler than previous methods obtaining this guarantee, all of which require solving O(log(1/OPT)) minimization problems.' volume: 162 URL: https://proceedings.mlr.press/v162/ji22a.html PDF: https://proceedings.mlr.press/v162/ji22a/ji22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ji22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ziwei family: Ji - given: Kwangjun family: Ahn - given: Pranjal family: Awasthi - given: Satyen family: Kale - given: Stefani family: Karp editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10068-10103 id: ji22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10068 lastpage: 10103 published: 2022-06-28 00:00:00 +0000 - title: 'Improving Policy Optimization with Generalist-Specialist Learning' abstract: 'Generalization in deep reinforcement learning over unseen environment variations usually requires policy learning over a large set of diverse training variations. We empirically observe that an agent trained on many variations (a generalist) tends to learn faster at the beginning, yet its performance plateaus at a less optimal level for a long time. In contrast, an agent trained only on a few variations (a specialist) can often achieve high returns under a limited computational budget. To have the best of both worlds, we propose a novel generalist-specialist training framework. Specifically, we first train a generalist on all environment variations; when it fails to improve, we launch a large population of specialists with weights cloned from the generalist, each trained to master a selected small subset of variations. We finally resume the training of the generalist with auxiliary rewards induced by demonstrations of all specialists. In particular, we investigate the timing to start specialist training and compare strategies to learn generalists with assistance from specialists. We show that this framework pushes the envelope of policy learning on several challenging and popular benchmarks including Procgen, Meta-World and ManiSkill.' volume: 162 URL: https://proceedings.mlr.press/v162/jia22a.html PDF: https://proceedings.mlr.press/v162/jia22a/jia22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jia22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhiwei family: Jia - given: Xuanlin family: Li - given: Zhan family: Ling - given: Shuang family: Liu - given: Yiran family: Wu - given: Hao family: Su editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10104-10119 id: jia22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10104 lastpage: 10119 published: 2022-06-28 00:00:00 +0000 - title: 'Translatotron 2: High-quality direct speech-to-speech translation with voice preservation' abstract: 'We present Translatotron 2, a neural direct speech-to-speech translation model that can be trained end-to-end. Translatotron 2 consists of a speech encoder, a linguistic decoder, an acoustic synthesizer, and a single attention module that connects them together. Experimental results on three datasets consistently show that Translatotron 2 outperforms the original Translatotron by a large margin on both translation quality (up to +15.5 BLEU) and speech generation quality, and approaches the same of cascade systems. In addition, we propose a simple method for preserving speakers’ voices from the source speech to the translation speech in a different language. Unlike existing approaches, the proposed method is able to preserve each speaker’s voice on speaker turns without requiring for speaker segmentation. Furthermore, compared to existing approaches, it better preserves speaker’s privacy and mitigates potential misuse of voice cloning for creating spoofing audio artifacts.' volume: 162 URL: https://proceedings.mlr.press/v162/jia22b.html PDF: https://proceedings.mlr.press/v162/jia22b/jia22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jia22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ye family: Jia - given: Michelle Tadmor family: Ramanovich - given: Tal family: Remez - given: Roi family: Pomerantz editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10120-10134 id: jia22b issued: date-parts: - 2022 - 6 - 28 firstpage: 10120 lastpage: 10134 published: 2022-06-28 00:00:00 +0000 - title: 'Online Learning and Pricing with Reusable Resources: Linear Bandits with Sub-Exponential Rewards' abstract: 'We consider a price-based revenue management problem with reusable resources over a finite time horizon $T$. The problem finds important applications in car/bicycle rental, ridesharing, cloud computing, and hospitality management. Customers arrive following a price-dependent Poisson process and each customer requests one unit of $c$ homogeneous reusable resources. If there is an available unit, the customer gets served within a price-dependent exponentially distributed service time; otherwise, she waits in a queue until the next available unit. The decision maker assumes that the inter-arrival and service intervals have an unknown linear dependence on a $d_f$-dimensional feature vector associated with the posted price. We propose a rate-optimal online learning and pricing algorithm, termed Batch Linear Confidence Bound (BLinUCB), and prove that the cumulative regret is $\tilde{O}( d_f \sqrt{T } )$. In establishing the regret, we bound the transient system performance upon price changes via a coupling argument, and also generalize linear bandits to accommodate sub-exponential rewards.' volume: 162 URL: https://proceedings.mlr.press/v162/jia22c.html PDF: https://proceedings.mlr.press/v162/jia22c/jia22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jia22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Huiwen family: Jia - given: Cong family: Shi - given: Siqian family: Shen editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10135-10160 id: jia22c issued: date-parts: - 2022 - 6 - 28 firstpage: 10135 lastpage: 10160 published: 2022-06-28 00:00:00 +0000 - title: 'The Role of Deconfounding in Meta-learning' abstract: 'Meta-learning has emerged as a potent paradigm for quick learning of few-shot tasks, by leveraging the meta-knowledge learned from meta-training tasks. Well-generalized meta-knowledge that facilitates fast adaptation in each task is preferred; however, recent evidence suggests the undesirable memorization effect where the meta-knowledge simply memorizing all meta-training tasks discourages task-specific adaptation and poorly generalizes. There have been several solutions to mitigating the effect, including both regularizer-based and augmentation-based methods, while a systematic understanding of these methods in a single framework is still lacking. In this paper, we offer a novel causal perspective of meta-learning. Through the lens of causality, we conclude the universal label space as a confounder to be the causing factor of memorization and frame the two lines of prevailing methods as different deconfounder approaches. Remarkably, derived from the causal inference principle of front-door adjustment, we propose two frustratingly easy but effective deconfounder algorithms, i.e., sampling multiple versions of the meta-knowledge via Dropout and grouping the meta-knowledge into multiple bins. The proposed causal perspective not only brings in the two deconfounder algorithms that surpass previous works in four benchmark datasets towards combating memorization, but also opens a promising direction for meta-learning.' volume: 162 URL: https://proceedings.mlr.press/v162/jiang22a.html PDF: https://proceedings.mlr.press/v162/jiang22a/jiang22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jiang22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yinjie family: Jiang - given: Zhengyu family: Chen - given: Kun family: Kuang - given: Luotian family: Yuan - given: Xinhai family: Ye - given: Zhihua family: Wang - given: Fei family: Wu - given: Ying family: Wei editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10161-10176 id: jiang22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10161 lastpage: 10176 published: 2022-06-28 00:00:00 +0000 - title: 'Subspace Learning for Effective Meta-Learning' abstract: 'Meta-learning aims to extract meta-knowledge from historical tasks to accelerate learning on new tasks. Typical meta-learning algorithms like MAML learn a globally-shared meta-model for all tasks. However, when the task environments are complex, task model parameters are diverse and a common meta-model is insufficient to capture all the meta-knowledge. To address this challenge, in this paper, task model parameters are structured into multiple subspaces, and each subspace represents one type of meta-knowledge. We propose an algorithm to learn the meta-parameters (\ie, subspace bases). We theoretically study the generalization properties of the learned subspaces. Experiments on regression and classification meta-learning datasets verify the effectiveness of the proposed algorithm.' volume: 162 URL: https://proceedings.mlr.press/v162/jiang22b.html PDF: https://proceedings.mlr.press/v162/jiang22b/jiang22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jiang22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Weisen family: Jiang - given: James family: Kwok - given: Yu family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10177-10194 id: jiang22b issued: date-parts: - 2022 - 6 - 28 firstpage: 10177 lastpage: 10194 published: 2022-06-28 00:00:00 +0000 - title: 'Optimal Algorithms for Stochastic Multi-Level Compositional Optimization' abstract: 'In this paper, we investigate the problem of stochastic multi-level compositional optimization, where the objective function is a composition of multiple smooth but possibly non-convex functions. Existing methods for solving this problem either suffer from sub-optimal sample complexities or need a huge batch size. To address this limitation, we propose a Stochastic Multi-level Variance Reduction method (SMVR), which achieves the optimal sample complexity of $\mathcal{O}\left(1 / \epsilon^{3}\right)$ to find an $\epsilon$-stationary point for non-convex objectives. Furthermore, when the objective function satisfies the convexity or Polyak-{Ł}ojasiewicz (PL) condition, we propose a stage-wise variant of SMVR and improve the sample complexity to $\mathcal{O}\left(1 / \epsilon^{2}\right)$ for convex functions or $\mathcal{O}\left(1 /(\mu\epsilon)\right)$ for non-convex functions satisfying the $\mu$-PL condition. The latter result implies the same complexity for $\mu$-strongly convex functions. To make use of adaptive learning rates, we also develop Adaptive SMVR, which achieves the same optimal complexities but converges faster in practice. All our complexities match the lower bounds not only in terms of $\epsilon$ but also in terms of $\mu$ (for PL or strongly convex functions), without using a large batch size in each iteration.' volume: 162 URL: https://proceedings.mlr.press/v162/jiang22c.html PDF: https://proceedings.mlr.press/v162/jiang22c/jiang22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jiang22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Wei family: Jiang - given: Bokun family: Wang - given: Yibo family: Wang - given: Lijun family: Zhang - given: Tianbao family: Yang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10195-10216 id: jiang22c issued: date-parts: - 2022 - 6 - 28 firstpage: 10195 lastpage: 10216 published: 2022-06-28 00:00:00 +0000 - title: 'Antibody-Antigen Docking and Design via Hierarchical Structure Refinement' abstract: 'Computational antibody design seeks to automatically create an antibody that binds to an antigen. The binding affinity is governed by the 3D binding interface where antibody residues (paratope) closely interact with antigen residues (epitope). Thus, the key question of antibody design is how to predict the 3D paratope-epitope complex (i.e., docking) for paratope generation. In this paper, we propose a new model called Hierarchical Structure Refinement Network (HSRN) for paratope docking and design. During docking, HSRN employs a hierarchical message passing network to predict atomic forces and use them to refine a binding complex in an iterative, equivariant manner. During generation, its autoregressive decoder progressively docks generated paratopes and builds a geometric representation of the binding interface to guide the next residue choice. Our results show that HSRN significantly outperforms prior state-of-the-art on paratope docking and design benchmarks.' volume: 162 URL: https://proceedings.mlr.press/v162/jin22a.html PDF: https://proceedings.mlr.press/v162/jin22a/jin22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jin22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Wengong family: Jin - given: Dr.Regina family: Barzilay - given: Tommi family: Jaakkola editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10217-10227 id: jin22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10217 lastpage: 10227 published: 2022-06-28 00:00:00 +0000 - title: 'Sharpened Quasi-Newton Methods: Faster Superlinear Rate and Larger Local Convergence Neighborhood' abstract: 'Non-asymptotic analysis of quasi-Newton methods have received a lot of attention recently. In particular, several works have established a non-asymptotic superlinear rate of $$\mathcal{O}((1/\sqrt{t})^t)$$ for the (classic) BFGS method by exploiting the fact that its error of Newton direction approximation approaches zero. Moreover, a greedy variant of the BFGS method was recently proposed which accelerates the convergence of BFGS by directly approximating the Hessian matrix, instead of Newton direction, and achieves a fast local quadratic convergence rate. Alas, the local quadratic convergence of Greedy-BFGS requires way more updates compared to the number of iterations that BFGS requires for a local superlinear rate. This is due to the fact that in Greedy-BFGS the Hessian is directly approximated and the Newton direction approximation may not be as accurate as the one for BFGS. In this paper, we close this gap and present a novel BFGS method that has the best of two worlds. More precisely, it leverages the approximation ideas of both BFGS and Greedy-BFGS to properly approximate both the Newton direction and the Hessian matrix. Our theoretical results show that our method out-performs both BFGS and Greedy-BFGS in terms of convergence rate, while it reaches its quadratic convergence rate with fewer steps compared to Greedy-BFGS. Numerical experiments on various datasets also confirm our theoretical findings.' volume: 162 URL: https://proceedings.mlr.press/v162/jin22b.html PDF: https://proceedings.mlr.press/v162/jin22b/jin22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jin22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Qiujiang family: Jin - given: Alec family: Koppel - given: Ketan family: Rajawat - given: Aryan family: Mokhtari editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10228-10250 id: jin22b issued: date-parts: - 2022 - 6 - 28 firstpage: 10228 lastpage: 10250 published: 2022-06-28 00:00:00 +0000 - title: 'The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces' abstract: 'Modern reinforcement learning (RL) commonly engages practical problems with large state spaces, where function approximation must be deployed to approximate either the value function or the policy. While recent progresses in RL theory address a rich set of RL problems with general function approximation, such successes are mostly restricted to the single-agent setting. It remains elusive how to extend these results to multi-agent RL, especially in the face of new game-theoretical challenges. This paper considers two-player zero-sum Markov Games (MGs). We propose a new algorithm that can provably find the Nash equilibrium policy using a polynomial number of samples, for any MG with low multi-agent Bellman-Eluder dimension—a new complexity measure adapted from its single-agent version (Jin et al., 2021). A key component of our new algorithm is the exploiter, which facilitates the learning of the main player by deliberately exploiting her weakness. Our theoretical framework is generic, which applies to a wide range of models including but not limited to tabular MGs, MGs with linear or kernel function approximation, and MGs with rich observations.' volume: 162 URL: https://proceedings.mlr.press/v162/jin22c.html PDF: https://proceedings.mlr.press/v162/jin22c/jin22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jin22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Chi family: Jin - given: Qinghua family: Liu - given: Tiancheng family: Yu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10251-10279 id: jin22c issued: date-parts: - 2022 - 6 - 28 firstpage: 10251 lastpage: 10279 published: 2022-06-28 00:00:00 +0000 - title: 'Domain Adaptation for Time Series Forecasting via Attention Sharing' abstract: 'Recently, deep neural networks have gained increasing popularity in the field of time series forecasting. A primary reason for their success is their ability to effectively capture complex temporal dynamics across multiple related time series. The advantages of these deep forecasters only start to emerge in the presence of a sufficient amount of data. This poses a challenge for typical forecasting problems in practice, where there is a limited number of time series or observations per time series, or both. To cope with this data scarcity issue, we propose a novel domain adaptation framework, Domain Adaptation Forecaster (DAF). DAF leverages statistical strengths from a relevant domain with abundant data samples (source) to improve the performance on the domain of interest with limited data (target). In particular, we use an attention-based shared module with a domain discriminator across domains and private modules for individual domains. We induce domain-invariant latent features (queries and keys) and retrain domain-specific features (values) simultaneously to enable joint training of forecasters on source and target domains. A main insight is that our design of aligning keys allows the target domain to leverage source time series even with different characteristics. Extensive experiments on various domains demonstrate that our proposed method outperforms state-of-the-art baselines on synthetic and real-world datasets, and ablation studies verify the effectiveness of our design choices.' volume: 162 URL: https://proceedings.mlr.press/v162/jin22d.html PDF: https://proceedings.mlr.press/v162/jin22d/jin22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jin22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xiaoyong family: Jin - given: Youngsuk family: Park - given: Danielle family: Maddix - given: Hao family: Wang - given: Yuyang family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10280-10297 id: jin22d issued: date-parts: - 2022 - 6 - 28 firstpage: 10280 lastpage: 10297 published: 2022-06-28 00:00:00 +0000 - title: 'Accelerated Federated Learning with Decoupled Adaptive Optimization' abstract: 'The federated learning (FL) framework enables edge clients to collaboratively learn a shared inference model while keeping privacy of training data on clients. Recently, many heuristics efforts have been made to generalize centralized adaptive optimization methods, such as SGDM, Adam, AdaGrad, etc., to federated settings for improving convergence and accuracy. However, there is still a paucity of theoretical principles on where to and how to design and utilize adaptive optimization methods in federated settings. This work aims to develop novel adaptive optimization methods for FL from the perspective of dynamics of ordinary differential equations (ODEs). First, an analytic framework is established to build a connection between federated optimization methods and decompositions of ODEs of corresponding centralized optimizers. Second, based on this analytic framework, a momentum decoupling adaptive optimization method, FedDA, is developed to fully utilize the global momentum on each local iteration and accelerate the training convergence. Last but not least, full batch gradients are utilized to mimic centralized optimization in the end of the training process to ensure the convergence and overcome the possible inconsistency caused by adaptive optimization methods.' volume: 162 URL: https://proceedings.mlr.press/v162/jin22e.html PDF: https://proceedings.mlr.press/v162/jin22e/jin22e.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jin22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jiayin family: Jin - given: Jiaxiang family: Ren - given: Yang family: Zhou - given: Lingjuan family: Lyu - given: Ji family: Liu - given: Dejing family: Dou editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10298-10322 id: jin22e issued: date-parts: - 2022 - 6 - 28 firstpage: 10298 lastpage: 10322 published: 2022-06-28 00:00:00 +0000 - title: 'Supervised Off-Policy Ranking' abstract: 'Off-policy evaluation (OPE) is to evaluate a target policy with data generated by other policies. Most previous OPE methods focus on precisely estimating the true performance of a policy. We observe that in many applications, (1) the end goal of OPE is to compare two or multiple candidate policies and choose a good one, which is a much simpler task than precisely evaluating their true performance; and (2) there are usually multiple policies that have been deployed to serve users in real-world systems and thus the true performance of these policies can be known. Inspired by the two observations, in this work, we study a new problem, supervised off-policy ranking (SOPR), which aims to rank a set of target policies based on supervised learning by leveraging off-policy data and policies with known performance. We propose a method to solve SOPR, which learns a policy scoring model by minimizing a ranking loss of the training policies rather than estimating the precise policy performance. The scoring model in our method, a hierarchical Transformer based model, maps a set of state-action pairs to a score, where the state of each pair comes from the off-policy data and the action is taken by a target policy on the state in an offline manner. Extensive experiments on public datasets show that our method outperforms baseline methods in terms of rank correlation, regret value, and stability. Our code is publicly available at GitHub.' volume: 162 URL: https://proceedings.mlr.press/v162/jin22f.html PDF: https://proceedings.mlr.press/v162/jin22f/jin22f.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jin22f.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yue family: Jin - given: Yue family: Zhang - given: Tao family: Qin - given: Xudong family: Zhang - given: Jian family: Yuan - given: Houqiang family: Li - given: Tie-Yan family: Liu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10323-10339 id: jin22f issued: date-parts: - 2022 - 6 - 28 firstpage: 10323 lastpage: 10339 published: 2022-06-28 00:00:00 +0000 - title: 'Input-agnostic Certified Group Fairness via Gaussian Parameter Smoothing' abstract: 'Only recently, researchers attempt to provide classification algorithms with provable group fairness guarantees. Most of these algorithms suffer from harassment caused by the requirement that the training and deployment data follow the same distribution. This paper proposes an input-agnostic certified group fairness algorithm, FairSmooth, for improving the fairness of classification models while maintaining the remarkable prediction accuracy. A Gaussian parameter smoothing method is developed to transform base classifiers into their smooth versions. An optimal individual smooth classifier is learnt for each group with only the data regarding the group and an overall smooth classifier for all groups is generated by averaging the parameters of all the individual smooth ones. By leveraging the theory of nonlinear functional analysis, the smooth classifiers are reformulated as output functions of a Nemytskii operator. Theoretical analysis is conducted to derive that the Nemytskii operator is smooth and induces a Frechet differentiable smooth manifold. We theoretically demonstrate that the smooth manifold has a global Lipschitz constant that is independent of the domain of the input data, which derives the input-agnostic certified group fairness.' volume: 162 URL: https://proceedings.mlr.press/v162/jin22g.html PDF: https://proceedings.mlr.press/v162/jin22g/jin22g.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jin22g.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jiayin family: Jin - given: Zeru family: Zhang - given: Yang family: Zhou - given: Lingfei family: Wu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10340-10361 id: jin22g issued: date-parts: - 2022 - 6 - 28 firstpage: 10340 lastpage: 10361 published: 2022-06-28 00:00:00 +0000 - title: 'Score-based Generative Modeling of Graphs via the System of Stochastic Differential Equations' abstract: 'Generating graph-structured data requires learning the underlying distribution of graphs. Yet, this is a challenging problem, and the previous graph generative methods either fail to capture the permutation-invariance property of graphs or cannot sufficiently model the complex dependency between nodes and edges, which is crucial for generating real-world graphs such as molecules. To overcome such limitations, we propose a novel score-based generative model for graphs with a continuous-time framework. Specifically, we propose a new graph diffusion process that models the joint distribution of the nodes and edges through a system of stochastic differential equations (SDEs). Then, we derive novel score matching objectives tailored for the proposed diffusion process to estimate the gradient of the joint log-density with respect to each component, and introduce a new solver for the system of SDEs to efficiently sample from the reverse diffusion process. We validate our graph generation method on diverse datasets, on which it either achieves significantly superior or competitive performance to the baselines. Further analysis shows that our method is able to generate molecules that lie close to the training distribution yet do not violate the chemical valency rule, demonstrating the effectiveness of the system of SDEs in modeling the node-edge relationships.' volume: 162 URL: https://proceedings.mlr.press/v162/jo22a.html PDF: https://proceedings.mlr.press/v162/jo22a/jo22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jo22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jaehyeong family: Jo - given: Seul family: Lee - given: Sung Ju family: Hwang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10362-10383 id: jo22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10362 lastpage: 10383 published: 2022-06-28 00:00:00 +0000 - title: 'Choosing Answers in Epsilon-Best-Answer Identification for Linear Bandits' abstract: 'In pure-exploration problems, information is gathered sequentially to answer a question on the stochastic environment. While best-arm identification for linear bandits has been extensively studied in recent years, few works have been dedicated to identifying one arm that is $\varepsilon$-close to the best one (and not exactly the best one). In this problem with several correct answers, an identification algorithm should focus on one candidate among those answers and verify that it is correct. We demonstrate that picking the answer with highest mean does not allow an algorithm to reach asymptotic optimality in terms of expected sample complexity. Instead, a furthest answer should be identified. Using that insight to choose the candidate answer carefully, we develop a simple procedure to adapt best-arm identification algorithms to tackle $\varepsilon$-best-answer identification in transductive linear stochastic bandits. Finally, we propose an asymptotically optimal algorithm for this setting, which is shown to achieve competitive empirical performance against existing modified best-arm identification algorithms.' volume: 162 URL: https://proceedings.mlr.press/v162/jourdan22a.html PDF: https://proceedings.mlr.press/v162/jourdan22a/jourdan22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jourdan22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Marc family: Jourdan - given: Rémy family: Degenne editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10384-10430 id: jourdan22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10384 lastpage: 10430 published: 2022-06-28 00:00:00 +0000 - title: 'Robust Fine-Tuning of Deep Neural Networks with Hessian-based Generalization Guarantees' abstract: 'We consider transfer learning approaches that fine-tune a pretrained deep neural network on a target task. We investigate generalization properties of fine-tuning to understand the problem of overfitting, which often happens in practice. Previous works have shown that constraining the distance from the initialization of fine-tuning improves generalization. Using a PAC-Bayesian analysis, we observe that besides distance from initialization, Hessians affect generalization through the noise stability of deep neural networks against noise injections. Motivated by the observation, we develop Hessian distance-based generalization bounds for a wide range of fine-tuning methods. Next, we investigate the robustness of fine-tuning with noisy labels. We design an algorithm that incorporates consistent losses and distance-based regularization for fine-tuning. Additionally, we prove a generalization error bound of our algorithm under class conditional independent noise in the training dataset labels. We perform a detailed empirical study of our algorithm on various noisy environments and architectures. For example, on six image classification tasks whose training labels are generated with programmatic labeling, we show a 3.26% accuracy improvement over prior methods. Meanwhile, the Hessian distance measure of the fine-tuned network using our algorithm decreases by six times more than existing approaches.' volume: 162 URL: https://proceedings.mlr.press/v162/ju22a.html PDF: https://proceedings.mlr.press/v162/ju22a/ju22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ju22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Haotian family: Ju - given: Dongyue family: Li - given: Hongyang R family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10431-10461 id: ju22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10431 lastpage: 10461 published: 2022-06-28 00:00:00 +0000 - title: 'Robust alignment of cross-session recordings of neural population activity by behaviour via unsupervised domain adaptation' abstract: 'Neural population activity relating to behaviour is assumed to be inherently low-dimensional despite the observed high dimensionality of data recorded using multi-electrode arrays. Therefore, predicting behaviour from neural population recordings has been shown to be most effective when using latent variable models. Over time however, the activity of single neurons can drift, and different neurons will be recorded due to movement of implanted neural probes. This means that a decoder trained to predict behaviour on one day performs worse when tested on a different day. On the other hand, evidence suggests that the latent dynamics underlying behaviour may be stable even over months and years. Based on this idea, we introduce a model capable of inferring behaviourally relevant latent dynamics from previously unseen data recorded from the same animal, without any need for decoder recalibration. We show that unsupervised domain adaptation combined with a sequential variational autoencoder, trained on several sessions, can achieve good generalisation to unseen data and correctly predict behaviour where conventional methods fail. Our results further support the hypothesis that behaviour-related neural dynamics are low-dimensional and stable over time, and will enable more effective and flexible use of brain computer interface technologies.' volume: 162 URL: https://proceedings.mlr.press/v162/jude22a.html PDF: https://proceedings.mlr.press/v162/jude22a/jude22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jude22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Justin family: Jude - given: Matthew family: Perich - given: Lee family: Miller - given: Matthias family: Hennig editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10462-10475 id: jude22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10462 lastpage: 10475 published: 2022-06-28 00:00:00 +0000 - title: 'On Measuring Causal Contributions via do-interventions' abstract: 'Causal contributions measure the strengths of different causes to a target quantity. Understanding causal contributions is important in empirical sciences and data-driven disciplines since it allows to answer practical queries like “what are the contributions of each cause to the effect?” In this paper, we develop a principled method for quantifying causal contributions. First, we provide desiderata of properties axioms that causal contribution measures should satisfy and propose the do-Shapley values (inspired by do-interventions [Pearl, 2000]) as a unique method satisfying these properties. Next, we develop a criterion under which the do-Shapley values can be efficiently inferred from non-experimental data. Finally, we provide do-Shapley estimators exhibiting consistency, computational feasibility, and statistical robustness. Simulation results corroborate with the theory.' volume: 162 URL: https://proceedings.mlr.press/v162/jung22a.html PDF: https://proceedings.mlr.press/v162/jung22a/jung22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jung22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yonghan family: Jung - given: Shiva family: Kasiviswanathan - given: Jin family: Tian - given: Dominik family: Janzing - given: Patrick family: Bloebaum - given: Elias family: Bareinboim editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10476-10501 id: jung22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10476 lastpage: 10501 published: 2022-06-28 00:00:00 +0000 - title: 'Efficient Approximate Inference for Stationary Kernel on Frequency Domain' abstract: 'Based on the Fourier duality between a stationary kernel and its spectral density, modeling the spectral density using a Gaussian mixture density enables one to construct a flexible kernel, known as a Spectral Mixture kernel, that can model any stationary kernel. However, despite its expressive power, training this kernel is typically difficult because scalability and overfitting issues often arise due to a large number of training parameters. To resolve these issues, we propose an approximate inference method for estimating the Spectral mixture kernel hyperparameters. Specifically, we approximate this kernel by using the finite random spectral points based on Random Fourier Feature and optimize the parameters for the distribution of spectral points by sampling-based variational inference. To improve this inference procedure, we analyze the training loss and propose two special methods: a sampling method of spectral points to reduce the error of the approximate kernel in training, and an approximate natural gradient to accelerate the convergence of parameter inference.' volume: 162 URL: https://proceedings.mlr.press/v162/jung22b.html PDF: https://proceedings.mlr.press/v162/jung22b/jung22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-jung22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yohan family: Jung - given: Kyungwoo family: Song - given: Jinkyoo family: Park editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10502-10538 id: jung22b issued: date-parts: - 2022 - 6 - 28 firstpage: 10502 lastpage: 10538 published: 2022-06-28 00:00:00 +0000 - title: 'Sketching Algorithms and Lower Bounds for Ridge Regression' abstract: 'We give a sketching-based iterative algorithm that computes a $1+\varepsilon$ approximate solution for the ridge regression problem $\min_x \|Ax-b\|_2^2 +\lambda\|x\|_2^2$ where $A \in R^{n \times d}$ with $d \ge n$. Our algorithm, for a constant number of iterations (requiring a constant number of passes over the input), improves upon earlier work (Chowdhury et al.) by requiring that the sketching matrix only has a weaker Approximate Matrix Multiplication (AMM) guarantee that depends on $\varepsilon$, along with a constant subspace embedding guarantee. The earlier work instead requires that the sketching matrix has a subspace embedding guarantee that depends on $\varepsilon$. For example, to produce a $1+\varepsilon$ approximate solution in $1$ iteration, which requires $2$ passes over the input, our algorithm requires the OSNAP embedding to have $m= O(n\sigma^2/\lambda\varepsilon)$ rows with a sparsity parameter $s = O(\log(n))$, whereas the earlier algorithm of Chowdhury et al. with the same number of rows of OSNAP requires a sparsity $s = O(\sqrt{\sigma^2/\lambda\varepsilon} \cdot \log(n))$, where $\sigma = \opnorm{A}$ is the spectral norm of the matrix $A$. We also show that this algorithm can be used to give faster algorithms for kernel ridge regression. Finally, we show that the sketch size required for our algorithm is essentially optimal for a natural framework of algorithms for ridge regression by proving lower bounds on oblivious sketching matrices for AMM. The sketch size lower bounds for AMM may be of independent interest.' volume: 162 URL: https://proceedings.mlr.press/v162/kacham22a.html PDF: https://proceedings.mlr.press/v162/kacham22a/kacham22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kacham22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Praneeth family: Kacham - given: David family: Woodruff editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10539-10556 id: kacham22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10539 lastpage: 10556 published: 2022-06-28 00:00:00 +0000 - title: 'Flashlight: Enabling Innovation in Tools for Machine Learning' abstract: 'As the computational requirements for machine learning systems and the size and complexity of machine learning frameworks increases, essential framework innovation has become challenging. While computational needs have driven recent compiler, networking, and hardware advancements, utilization of those advancements by machine learning tools is occurring at a slower pace. This is in part due to the difficulties involved in prototyping new computational paradigms with existing frameworks. Large frameworks prioritize machine learning researchers and practitioners as end users and pay comparatively little attention to systems researchers who can push frameworks forward — we argue that both are equally important stakeholders. We introduce Flashlight, an open-source library built to spur innovation in machine learning tools and systems by prioritizing open, modular, customizable internals and state-of-the-art, research-ready models and training setups across a variety of domains. Flashlight allows systems researchers to rapidly prototype and experiment with novel ideas in machine learning computation and has low overhead, competing with and often outperforming other popular machine learning frameworks. We see Flashlight as a tool enabling research that can benefit widely used libraries downstream and bring machine learning and systems researchers closer together.' volume: 162 URL: https://proceedings.mlr.press/v162/kahn22a.html PDF: https://proceedings.mlr.press/v162/kahn22a/kahn22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kahn22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jacob D family: Kahn - given: Vineel family: Pratap - given: Tatiana family: Likhomanenko - given: Qiantong family: Xu - given: Awni family: Hannun - given: Jeff family: Cai - given: Paden family: Tomasello - given: Ann family: Lee - given: Edouard family: Grave - given: Gilad family: Avidov - given: Benoit family: Steiner - given: Vitaliy family: Liptchinsky - given: Gabriel family: Synnaeve - given: Ronan family: Collobert editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10557-10574 id: kahn22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10557 lastpage: 10574 published: 2022-06-28 00:00:00 +0000 - title: 'Learning-based Optimisation of Particle Accelerators Under Partial Observability Without Real-World Training' abstract: 'In recent work, it has been shown that reinforcement learning (RL) is capable of solving a variety of problems at sometimes super-human performance levels. But despite continued advances in the field, applying RL to complex real-world control and optimisation problems has proven difficult. In this contribution, we demonstrate how to successfully apply RL to the optimisation of a highly complex real-world machine {–} specifically a linear particle accelerator {–} in an only partially observable setting and without requiring training on the real machine. Our method outperforms conventional optimisation algorithms in both the achieved result and time taken as well as already achieving close to human-level performance. We expect that such automation of machine optimisation will push the limits of operability, increase machine availability and lead to a paradigm shift in how such machines are operated, ultimately facilitating advances in a variety of fields, such as science and medicine among many others.' volume: 162 URL: https://proceedings.mlr.press/v162/kaiser22a.html PDF: https://proceedings.mlr.press/v162/kaiser22a/kaiser22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kaiser22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jan family: Kaiser - given: Oliver family: Stein - given: Annika family: Eichler editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10575-10585 id: kaiser22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10575 lastpage: 10585 published: 2022-06-28 00:00:00 +0000 - title: 'Stochastic Deep Networks with Linear Competing Units for Model-Agnostic Meta-Learning' abstract: 'This work addresses meta-learning (ML) by considering deep networks with stochastic local winner-takes-all (LWTA) activations. This type of network units results in sparse representations from each model layer, as the units are organized into blocks where only one unit generates a non-zero output. The main operating principle of the introduced units rely on stochastic principles, as the network performs posterior sampling over competing units to select the winner. Therefore, the proposed networks are explicitly designed to extract input data representations of sparse stochastic nature, as opposed to the currently standard deterministic representation paradigm. Our approach produces state-of-the-art predictive accuracy on few-shot image classification and regression experiments, as well as reduced predictive error on an active learning setting; these improvements come with an immensely reduced computational cost. Code is available at: https://github.com/Kkalais/StochLWTA-ML' volume: 162 URL: https://proceedings.mlr.press/v162/kalais22a.html PDF: https://proceedings.mlr.press/v162/kalais22a/kalais22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kalais22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Konstantinos family: Kalais - given: Sotirios family: Chatzis editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10586-10597 id: kalais22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10586 lastpage: 10597 published: 2022-06-28 00:00:00 +0000 - title: 'Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning' abstract: 'Off-policy evaluation and learning (OPE/L) use offline observational data to make better decisions, which is crucial in applications where online experimentation is limited. However, depending entirely on logged data, OPE/L is sensitive to environment distribution shifts — discrepancies between the data-generating environment and that where policies are deployed. Si et al., (2020) proposed distributionally robust OPE/L (DROPE/L) to address this, but the proposal relies on inverse-propensity weighting, whose estimation error and regret will deteriorate if propensities are nonparametrically estimated and whose variance is suboptimal even if not. For standard, non-robust, OPE/L, this is solved by doubly robust (DR) methods, but they do not naturally extend to the more complex DROPE/L, which involves a worst-case expectation. In this paper, we propose the first DR algorithms for DROPE/L with KL-divergence uncertainty sets. For evaluation, we propose Localized Doubly Robust DROPE (LDR$^2$OPE) and show that it achieves semiparametric efficiency under weak product rates conditions. Thanks to a localization technique, LDR$^2$OPE only requires fitting a small number of regressions, just like DR methods for standard OPE. For learning, we propose Continuum Doubly Robust DROPL (CDR$^2$OPL) and show that, under a product rate condition involving a continuum of regressions, it enjoys a fast regret rate of $O(N^{-1/2})$ even when unknown propensities are nonparametrically estimated. We empirically validate our algorithms in simulations and further extend our results to general $f$-divergence uncertainty sets.' volume: 162 URL: https://proceedings.mlr.press/v162/kallus22a.html PDF: https://proceedings.mlr.press/v162/kallus22a/kallus22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kallus22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Nathan family: Kallus - given: Xiaojie family: Mao - given: Kaiwen family: Wang - given: Zhengyuan family: Zhou editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10598-10632 id: kallus22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10598 lastpage: 10632 published: 2022-06-28 00:00:00 +0000 - title: 'Improved Rates for Differentially Private Stochastic Convex Optimization with Heavy-Tailed Data' abstract: 'We study stochastic convex optimization with heavy-tailed data under the constraint of differential privacy (DP). Most prior work on this problem is restricted to the case where the loss function is Lipschitz. Instead, as introduced by Wang, Xiao, Devadas, and Xu \cite{WangXDX20}, we study general convex loss functions with the assumption that the distribution of gradients has bounded $k$-th moments. We provide improved upper bounds on the excess population risk under concentrated DP for convex and strongly convex loss functions. Along the way, we derive new algorithms for private mean estimation of heavy-tailed distributions, under both pure and concentrated DP. Finally, we prove nearly-matching lower bounds for private stochastic convex optimization with strongly convex losses and mean estimation, showing new separations between pure and concentrated DP.' volume: 162 URL: https://proceedings.mlr.press/v162/kamath22a.html PDF: https://proceedings.mlr.press/v162/kamath22a/kamath22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kamath22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Gautam family: Kamath - given: Xingtu family: Liu - given: Huanyu family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10633-10660 id: kamath22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10633 lastpage: 10660 published: 2022-06-28 00:00:00 +0000 - title: 'Comprehensive Analysis of Negative Sampling in Knowledge Graph Representation Learning' abstract: 'Negative sampling (NS) loss plays an important role in learning knowledge graph embedding (KGE) to handle a huge number of entities. However, the performance of KGE degrades without hyperparameters such as the margin term and number of negative samples in NS loss being appropriately selected. Currently, empirical hyperparameter tuning addresses this problem at the cost of computational time. To solve this problem, we theoretically analyzed NS loss to assist hyperparameter tuning and understand the better use of the NS loss in KGE learning. Our theoretical analysis showed that scoring methods with restricted value ranges, such as TransE and RotatE, require appropriate adjustment of the margin term or the number of negative samples different from those without restricted value ranges, such as RESCAL, ComplEx, and DistMult. We also propose subsampling methods specialized for the NS loss in KGE studied from a theoretical aspect. Our empirical analysis on the FB15k-237, WN18RR, and YAGO3-10 datasets showed that the results of actually trained models agree with our theoretical findings.' volume: 162 URL: https://proceedings.mlr.press/v162/kamigaito22a.html PDF: https://proceedings.mlr.press/v162/kamigaito22a/kamigaito22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kamigaito22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hidetaka family: Kamigaito - given: Katsuhiko family: Hayashi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10661-10675 id: kamigaito22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10661 lastpage: 10675 published: 2022-06-28 00:00:00 +0000 - title: 'Matching Learned Causal Effects of Neural Networks with Domain Priors' abstract: 'A trained neural network can be interpreted as a structural causal model (SCM) that provides the effect of changing input variables on the model’s output. However, if training data contains both causal and correlational relationships, a model that optimizes prediction accuracy may not necessarily learn the true causal relationships between input and output variables. On the other hand, expert users often have prior knowledge of the causal relationship between certain input variables and output from domain knowledge. Therefore, we propose a regularization method that aligns the learned causal effects of a neural network with domain priors, including both direct and total causal effects. We show that this approach can generalize to different kinds of domain priors, including monotonicity of causal effect of an input variable on output or zero causal effect of a variable on output for purposes of fairness. Our experiments on twelve benchmark datasets show its utility in regularizing a neural network model to maintain desired causal effects, without compromising on accuracy. Importantly, we also show that a model thus trained is robust and gets improved accuracy on noisy inputs.' volume: 162 URL: https://proceedings.mlr.press/v162/kancheti22a.html PDF: https://proceedings.mlr.press/v162/kancheti22a/kancheti22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kancheti22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sai Srinivas family: Kancheti - given: Abbavaram Gowtham family: Reddy - given: Vineeth N family: Balasubramanian - given: Amit family: Sharma editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10676-10696 id: kancheti22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10676 lastpage: 10696 published: 2022-06-28 00:00:00 +0000 - title: 'Deduplicating Training Data Mitigates Privacy Risks in Language Models' abstract: 'Past work has shown that large language models are susceptible to privacy attacks, where adversaries generate sequences from a trained model and detect which sequences are memorized from the training set. In this work, we show that the success of these attacks is largely due to duplication in commonly used web-scraped training sets. We first show that the rate at which language models regenerate training sequences is superlinearly related to a sequence’s count in the training set. For instance, a sequence that is present 10 times in the training data is on average generated 1000x more often than a sequence that is present only once. We next show that existing methods for detecting memorized sequences have near-chance accuracy on non-duplicated training sequences. Finally, we find that after applying methods to deduplicate training data, language models are considerably more secure against these types of privacy attacks. Taken together, our results motivate an increased focus on deduplication in privacy-sensitive applications and a reevaluation of the practicality of existing privacy attacks.' volume: 162 URL: https://proceedings.mlr.press/v162/kandpal22a.html PDF: https://proceedings.mlr.press/v162/kandpal22a/kandpal22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kandpal22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Nikhil family: Kandpal - given: Eric family: Wallace - given: Colin family: Raffel editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10697-10707 id: kandpal22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10697 lastpage: 10707 published: 2022-06-28 00:00:00 +0000 - title: 'Lyapunov Density Models: Constraining Distribution Shift in Learning-Based Control' abstract: 'Learned models and policies can generalize effectively when evaluated within the distribution of the training data, but can produce unpredictable and erroneous outputs on out-of-distribution inputs. In order to avoid distribution shift when deploying learning-based control algorithms, we seek a mechanism to constrain the agent to states and actions that resemble those that the method was trained on. In control theory, Lyapunov stability and control-invariant sets allow us to make guarantees about controllers that stabilize the system around specific states, while in machine learning, density models allow us to estimate the training data distribution. Can we combine these two concepts, producing learning-based control algorithms that constrain the system to in-distribution states using only in-distribution actions? In this paper, we propose to do this by combining concepts from Lyapunov stability and density estimation, introducing Lyapunov density models: a generalization of control Lyapunov functions and density models that provides guarantees about an agent’s ability to stay in-distribution over its entire trajectory.' volume: 162 URL: https://proceedings.mlr.press/v162/kang22a.html PDF: https://proceedings.mlr.press/v162/kang22a/kang22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kang22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Katie family: Kang - given: Paula family: Gradu - given: Jason J family: Choi - given: Michael family: Janner - given: Claire family: Tomlin - given: Sergey family: Levine editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10708-10733 id: kang22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10708 lastpage: 10733 published: 2022-06-28 00:00:00 +0000 - title: 'Forget-free Continual Learning with Winning Subnetworks' abstract: 'Inspired by Lottery Ticket Hypothesis that competitive subnetworks exist within a dense network, we propose a continual learning method referred to as Winning SubNetworks (WSN), which sequentially learns and selects an optimal subnetwork for each task. Specifically, WSN jointly learns the model weights and task-adaptive binary masks pertaining to subnetworks associated with each task whilst attempting to select a small set of weights to be activated (winning ticket) by reusing weights of the prior subnetworks. The proposed method is inherently immune to catastrophic forgetting as each selected subnetwork model does not infringe upon other subnetworks. Binary masks spawned per winning ticket are encoded into one N-bit binary digit mask, then compressed using Huffman coding for a sub-linear increase in network capacity with respect to the number of tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/kang22b.html PDF: https://proceedings.mlr.press/v162/kang22b/kang22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kang22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Haeyong family: Kang - given: Rusty John Lloyd family: Mina - given: Sultan Rizky Hikmawan family: Madjid - given: Jaehong family: Yoon - given: Mark family: Hasegawa-Johnson - given: Sung Ju family: Hwang - given: Chang D. family: Yoo editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10734-10750 id: kang22b issued: date-parts: - 2022 - 6 - 28 firstpage: 10734 lastpage: 10750 published: 2022-06-28 00:00:00 +0000 - title: 'Differentially Private Approximate Quantiles' abstract: 'In this work we study the problem of differentially private (DP) quantiles, in which given dataset $X$ and quantiles $q_1, ..., q_m \in [0,1]$, we want to output $m$ quantile estimations which are as close as possible to the true quantiles and preserve DP. We describe a simple recursive DP algorithm, which we call Approximate Quantiles (AQ), for this task. We give a worst case upper bound on its error, and show that its error is much lower than of previous implementations on several different datasets. Furthermore, it gets this low error while running time two orders of magnitude faster that the best previous implementation.' volume: 162 URL: https://proceedings.mlr.press/v162/kaplan22a.html PDF: https://proceedings.mlr.press/v162/kaplan22a/kaplan22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kaplan22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Haim family: Kaplan - given: Shachar family: Schnapp - given: Uri family: Stemmer editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10751-10761 id: kaplan22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10751 lastpage: 10761 published: 2022-06-28 00:00:00 +0000 - title: 'Simultaneous Graph Signal Clustering and Graph Learning' abstract: 'Graph learning (GL) aims to infer the topology of an unknown graph from a set of observations on its nodes, i.e., graph signals. While most of the existing GL approaches focus on homogeneous datasets, in many real world applications, data is heterogeneous, where graph signals are clustered and each cluster is associated with a different graph. In this paper, we address the problem of learning multiple graphs from heterogeneous data by formulating an optimization problem for joint graph signal clustering and graph topology inference. In particular, our approach extends spectral clustering by partitioning the graph signals not only based on their pairwise similarities but also their smoothness with respect to the graphs associated with the clusters. The proposed method also learns the representative graph for each cluster using the smoothness of the graph signals with respect to the graph topology. The resulting optimization problem is solved with an efficient block-coordinate descent algorithm and results on simulated and real data indicate the effectiveness of the proposed method.' volume: 162 URL: https://proceedings.mlr.press/v162/karaaslanli22a.html PDF: https://proceedings.mlr.press/v162/karaaslanli22a/karaaslanli22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-karaaslanli22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Abdullah family: Karaaslanli - given: Selin family: Aviyente editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10762-10772 id: karaaslanli22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10762 lastpage: 10772 published: 2022-06-28 00:00:00 +0000 - title: 'Composing Partial Differential Equations with Physics-Aware Neural Networks' abstract: 'We introduce a compositional physics-aware FInite volume Neural Network (FINN) for learning spatiotemporal advection-diffusion processes. FINN implements a new way of combining the learning abilities of artificial neural networks with physical and structural knowledge from numerical simulation by modeling the constituents of partial differential equations (PDEs) in a compositional manner. Results on both one- and two-dimensional PDEs (Burgers’, diffusion-sorption, diffusion-reaction, Allen{–}Cahn) demonstrate FINN’s superior modeling accuracy and excellent out-of-distribution generalization ability beyond initial and boundary conditions. With only one tenth of the number of parameters on average, FINN outperforms pure machine learning and other state-of-the-art physics-aware models in all cases{—}often even by multiple orders of magnitude. Moreover, FINN outperforms a calibrated physical model when approximating sparse real-world data in a diffusion-sorption scenario, confirming its generalization abilities and showing explanatory potential by revealing the unknown retardation factor of the observed process.' volume: 162 URL: https://proceedings.mlr.press/v162/karlbauer22a.html PDF: https://proceedings.mlr.press/v162/karlbauer22a/karlbauer22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-karlbauer22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Matthias family: Karlbauer - given: Timothy family: Praditia - given: Sebastian family: Otte - given: Sergey family: Oladyshkin - given: Wolfgang family: Nowak - given: Martin V. family: Butz editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10773-10801 id: karlbauer22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10773 lastpage: 10801 published: 2022-06-28 00:00:00 +0000 - title: 'Meta-Learning Hypothesis Spaces for Sequential Decision-making' abstract: 'Obtaining reliable, adaptive confidence sets for prediction functions (hypotheses) is a central challenge in sequential decision-making tasks, such as bandits and model-based reinforcement learning. These confidence sets typically rely on prior assumptions on the hypothesis space, e.g., the known kernel of a Reproducing Kernel Hilbert Space (RKHS). Hand-designing such kernels is error prone, and misspecification may lead to poor or unsafe performance. In this work, we propose to meta-learn a kernel from offline data (Meta-KeL). For the case where the unknown kernel is a combination of known base kernels, we develop an estimator based on structured sparsity. Under mild conditions, we guarantee that our estimated RKHS yields valid confidence sets that, with increasing amounts of offline data, become as tight as those given the true unknown kernel. We demonstrate our approach on the kernelized bandits problem (a.k.a. Bayesian optimization), where we establish regret bounds competitive with those given the true kernel. We also empirically evaluate the effectiveness of our approach on a Bayesian optimization task.' volume: 162 URL: https://proceedings.mlr.press/v162/kassraie22a.html PDF: https://proceedings.mlr.press/v162/kassraie22a/kassraie22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kassraie22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Parnian family: Kassraie - given: Jonas family: Rothfuss - given: Andreas family: Krause editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10802-10824 id: kassraie22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10802 lastpage: 10824 published: 2022-06-28 00:00:00 +0000 - title: 'FOCUS: Familiar Objects in Common and Uncommon Settings' abstract: 'Standard training datasets for deep learning often do not contain objects in uncommon and rare settings (e.g., “a plane on water”, “a car in snowy weather”). This can cause models trained on these datasets to incorrectly predict objects that are typical for the context in the image, rather than identifying the objects that are actually present. In this paper, we introduce FOCUS (Familiar Objects in Common and Uncommon Settings), a dataset for stress-testing the generalization power of deep image classifiers. By leveraging the power of modern search engines, we deliberately gather data containing objects in common and uncommon settings; in a wide range of locations, weather conditions, and time of day. We present a detailed analysis of the performance of various popular image classifiers on our dataset and demonstrate a clear drop in accuracy when classifying images in uncommon settings. We also show that finetuning a model on our dataset drastically improves its ability to focus on the object of interest leading to better generalization. Lastly, we leverage FOCUS to machine annotate additional visual attributes for the entirety of ImageNet. We believe that our dataset will aid researchers in understanding the inability of deep models to generalize well to uncommon settings and drive future work on improving their distributional robustness.' volume: 162 URL: https://proceedings.mlr.press/v162/kattakinda22a.html PDF: https://proceedings.mlr.press/v162/kattakinda22a/kattakinda22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kattakinda22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Priyatham family: Kattakinda - given: Soheil family: Feizi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10825-10847 id: kattakinda22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10825 lastpage: 10847 published: 2022-06-28 00:00:00 +0000 - title: 'Training OOD Detectors in their Natural Habitats' abstract: 'Out-of-distribution (OOD) detection is important for machine learning models deployed in the wild. Recent methods use auxiliary outlier data to regularize the model for improved OOD detection. However, these approaches make a strong distributional assumption that the auxiliary outlier data is completely separable from the in-distribution (ID) data. In this paper, we propose a novel framework that leverages wild mixture data—that naturally consists of both ID and OOD samples. Such wild data is abundant and arises freely upon deploying a machine learning classifier in their natural habitats. Our key idea is to formulate a constrained optimization problem and to show how to tractably solve it. Our learning objective maximizes the OOD detection rate, subject to constraints on the classification error of ID data and on the OOD error rate of ID examples. We extensively evaluate our approach on common OOD detection tasks and demonstrate superior performance. Code is available at https://github.com/jkatzsam/woods_ood.' volume: 162 URL: https://proceedings.mlr.press/v162/katz-samuels22a.html PDF: https://proceedings.mlr.press/v162/katz-samuels22a/katz-samuels22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-katz-samuels22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Julian family: Katz-Samuels - given: Julia B family: Nakhleh - given: Robert family: Nowak - given: Yixuan family: Li editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10848-10865 id: katz-samuels22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10848 lastpage: 10865 published: 2022-06-28 00:00:00 +0000 - title: 'Robustness Implies Generalization via Data-Dependent Generalization Bounds' abstract: 'This paper proves that robustness implies generalization via data-dependent generalization bounds. As a result, robustness and generalization are shown to be connected closely in a data-dependent manner. Our bounds improve previous bounds in two directions, to solve an open problem that has seen little development since 2010. The first is to reduce the dependence on the covering number. The second is to remove the dependence on the hypothesis space. We present several examples, including ones for lasso and deep learning, in which our bounds are provably preferable. The experiments on real-world data and theoretical models demonstrate near-exponential improvements in various situations. To achieve these improvements, we do not require additional assumptions on the unknown distribution; instead, we only incorporate an observable and computable property of the training samples. A key technical innovation is an improved concentration bound for multinomial random variables that is of independent interest beyond robustness and generalization.' volume: 162 URL: https://proceedings.mlr.press/v162/kawaguchi22a.html PDF: https://proceedings.mlr.press/v162/kawaguchi22a/kawaguchi22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kawaguchi22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Kenji family: Kawaguchi - given: Zhun family: Deng - given: Kyle family: Luh - given: Jiaoyang family: Huang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10866-10894 id: kawaguchi22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10866 lastpage: 10894 published: 2022-06-28 00:00:00 +0000 - title: 'Generating Distributional Adversarial Examples to Evade Statistical Detectors' abstract: 'Deep neural networks (DNNs) are known to be highly vulnerable to adversarial examples (AEs) that include malicious perturbations. Assumptions about the statistical differences between natural and adversarial inputs are commonplace in many detection techniques. As a best practice, AE detectors are evaluated against ’adaptive’ attackers who actively perturb their inputs to avoid detection. Due to the difficulties in designing adaptive attacks, however, recent work suggests that most detectors have incomplete evaluation. We aim to fill this gap by designing a generic adaptive attack against detectors: the ’statistical indistinguishability attack’ (SIA). SIA optimizes a novel objective to craft adversarial examples (AEs) that follow the same distribution as the natural inputs with respect to DNN representations. Our objective targets all DNN layers simultaneously as we show that AEs being indistinguishable at one layer might fail to be so at other layers. SIA is formulated around evading distributional detectors that inspect a set of AEs as a whole and is also effective against four individual AE detectors, two dataset shift detectors, and an out-of-distribution sample detector, curated from published works. This suggests that SIA can be a reliable tool for evaluating the security of a range of detectors.' volume: 162 URL: https://proceedings.mlr.press/v162/kaya22a.html PDF: https://proceedings.mlr.press/v162/kaya22a/kaya22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kaya22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yigitcan family: Kaya - given: Muhammad Bilal family: Zafar - given: Sergul family: Aydore - given: Nathalie family: Rauschmayr - given: Krishnaram family: Kenthapadi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10895-10911 id: kaya22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10895 lastpage: 10911 published: 2022-06-28 00:00:00 +0000 - title: 'Secure Quantized Training for Deep Learning' abstract: 'We implement training of neural networks in secure multi-party computation (MPC) using quantization commonly used in said setting. We are the first to present an MNIST classifier purely trained in MPC that comes within 0.2 percent of the accuracy of the same convolutional neural network trained via plaintext computation. More concretely, we have trained a network with two convolutional and two dense layers to 99.2% accuracy in 3.5 hours (under one hour for 99% accuracy). We have also implemented AlexNet for CIFAR-10, which converges in a few hours. We develop novel protocols for exponentiation and inverse square root. Finally, we present experiments in a range of MPC security models for up to ten parties, both with honest and dishonest majority as well as semi-honest and malicious security.' volume: 162 URL: https://proceedings.mlr.press/v162/keller22a.html PDF: https://proceedings.mlr.press/v162/keller22a/keller22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-keller22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Marcel family: Keller - given: Ke family: Sun editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10912-10938 id: keller22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10912 lastpage: 10938 published: 2022-06-28 00:00:00 +0000 - title: 'A Convergent and Dimension-Independent Min-Max Optimization Algorithm' abstract: 'We study a variant of a recently introduced min-max optimization framework where the max-player is constrained to update its parameters in a greedy manner until it reaches a first-order stationary point. Our equilibrium definition for this framework depends on a proposal distribution which the min-player uses to choose directions in which to update its parameters. We show that, given a smooth and bounded nonconvex-nonconcave objective function, access to any proposal distribution for the min-player’s updates, and stochastic gradient oracle for the max-player, our algorithm converges to the aforementioned approximate local equilibrium in a number of iterations that does not depend on the dimension. The equilibrium point found by our algorithm depends on the proposal distribution, and when applying our algorithm to train GANs we choose the proposal distribution to be a distribution of stochastic gradients. We empirically evaluate our algorithm on challenging nonconvex-nonconcave test-functions and loss functions arising in GAN training. Our algorithm converges on these test functions and, when used to train GANs, trains stably on synthetic and real-world datasets and avoids mode collapse.' volume: 162 URL: https://proceedings.mlr.press/v162/keswani22a.html PDF: https://proceedings.mlr.press/v162/keswani22a/keswani22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-keswani22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Vijay family: Keswani - given: Oren family: Mangoubi - given: Sushant family: Sachdeva - given: Nisheeth K. family: Vishnoi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10939-10973 id: keswani22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10939 lastpage: 10973 published: 2022-06-28 00:00:00 +0000 - title: 'Neural Network Poisson Models for Behavioural and Neural Spike Train Data' abstract: 'One of the most important and challenging application areas for complex machine learning methods is to predict, characterize and model rich, multi-dimensional, neural data. Recent advances in neural recording techniques have made it possible to monitor the activity of a large number of neurons across different brain regions as animals perform behavioural tasks. This poses the critical challenge of establishing links between neural activity at a microscopic scale, which might for instance represent sensory input, and at a macroscopic scale, which then generates behaviour. Predominant modeling methods apply rather disjoint techniques to these scales; by contrast, we suggest an end-to-end model which exploits recent developments of flexible, but tractable, neural network point-process models to characterize dependencies between stimuli, actions, and neural data. We apply this model to a public dataset collected using Neuropixel probes in mice performing a visually-guided behavioural task as well as a synthetic dataset produced from a hierarchical network model with reciprocally connected sensory and integration circuits intended to characterize animal behaviour in a fixed-duration motion discrimination task. We show that our model outperforms previous approaches and contributes novel insights into the relationships between neural activity and behaviour.' volume: 162 URL: https://proceedings.mlr.press/v162/khajehnejad22a.html PDF: https://proceedings.mlr.press/v162/khajehnejad22a/khajehnejad22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-khajehnejad22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Moein family: Khajehnejad - given: Forough family: Habibollahi - given: Richard family: Nock - given: Ehsan family: Arabzadeh - given: Peter family: Dayan - given: Amir family: Dezfouli editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10974-10996 id: khajehnejad22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10974 lastpage: 10996 published: 2022-06-28 00:00:00 +0000 - title: 'Federated Reinforcement Learning: Linear Speedup Under Markovian Sampling' abstract: 'Since reinforcement learning algorithms are notoriously data-intensive, the task of sampling observations from the environment is usually split across multiple agents. However, transferring these observations from the agents to a central location can be prohibitively expensive in terms of the communication cost, and it can also compromise the privacy of each agent’s local behavior policy. In this paper, we consider a federated reinforcement learning framework where multiple agents collaboratively learn a global model, without sharing their individual data and policies. Each agent maintains a local copy of the model and updates it using locally sampled data. Although having N agents enables the sampling of N times more data, it is not clear if it leads to proportional convergence speedup. We propose federated versions of on-policy TD, off-policy TD and Q-learning, and analyze their convergence. For all these algorithms, to the best of our knowledge, we are the first to consider Markovian noise and multiple local updates, and prove a linear convergence speedup with respect to the number of agents. To obtain these results, we show that federated TD and Q-learning are special cases of a general framework for federated stochastic approximation with Markovian noise, and we leverage this framework to provide a unified convergence analysis that applies to all the algorithms.' volume: 162 URL: https://proceedings.mlr.press/v162/khodadadian22a.html PDF: https://proceedings.mlr.press/v162/khodadadian22a/khodadadian22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-khodadadian22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sajad family: Khodadadian - given: Pranay family: Sharma - given: Gauri family: Joshi - given: Siva Theja family: Maguluri editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 10997-11057 id: khodadadian22a issued: date-parts: - 2022 - 6 - 28 firstpage: 10997 lastpage: 11057 published: 2022-06-28 00:00:00 +0000 - title: 'Multi-Level Branched Regularization for Federated Learning' abstract: 'A critical challenge of federated learning is data heterogeneity and imbalance across clients, which leads to inconsistency between local networks and unstable convergence of global models. To alleviate the limitations, we propose a novel architectural regularization technique that constructs multiple auxiliary branches in each local model by grafting local and global subnetworks at several different levels and that learns the representations of the main pathway in the local model congruent to the auxiliary hybrid pathways via online knowledge distillation. The proposed technique is effective to robustify the global model even in the non-iid setting and is applicable to various federated learning frameworks conveniently without incurring extra communication costs. We perform comprehensive empirical studies and demonstrate remarkable performance gains in terms of accuracy and efficiency compared to existing methods. The source code is available at our project page.' volume: 162 URL: https://proceedings.mlr.press/v162/kim22a.html PDF: https://proceedings.mlr.press/v162/kim22a/kim22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kim22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jinkyu family: Kim - given: Geeho family: Kim - given: Bohyung family: Han editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11058-11073 id: kim22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11058 lastpage: 11073 published: 2022-06-28 00:00:00 +0000 - title: 'Learning fair representation with a parametric integral probability metric' abstract: 'As they have a vital effect on social decision-making, AI algorithms should be not only accurate but also fair. Among various algorithms for fairness AI, learning fair representation (LFR), whose goal is to find a fair representation with respect to sensitive variables such as gender and race, has received much attention. For LFR, the adversarial training scheme is popularly employed as is done in the generative adversarial network type algorithms. The choice of a discriminator, however, is done heuristically without justification. In this paper, we propose a new adversarial training scheme for LFR, where the integral probability metric (IPM) with a specific parametric family of discriminators is used. The most notable result of the proposed LFR algorithm is its theoretical guarantee about the fairness of the final prediction model, which has not been considered yet. That is, we derive theoretical relations between the fairness of representation and the fairness of the prediction model built on the top of the representation (i.e., using the representation as the input). Moreover, by numerical experiments, we show that our proposed LFR algorithm is computationally lighter and more stable, and the final prediction model is competitive or superior to other LFR algorithms using more complex discriminators.' volume: 162 URL: https://proceedings.mlr.press/v162/kim22b.html PDF: https://proceedings.mlr.press/v162/kim22b/kim22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kim22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Dongha family: Kim - given: Kunwoong family: Kim - given: Insung family: Kong - given: Ilsang family: Ohn - given: Yongdai family: Kim editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11074-11101 id: kim22b issued: date-parts: - 2022 - 6 - 28 firstpage: 11074 lastpage: 11101 published: 2022-06-28 00:00:00 +0000 - title: 'Dataset Condensation via Efficient Synthetic-Data Parameterization' abstract: 'The great success of machine learning with massive amounts of data comes at a price of huge computation costs and storage for training and tuning. Recent studies on dataset condensation attempt to reduce the dependence on such massive data by synthesizing a compact training dataset. However, the existing approaches have fundamental limitations in optimization due to the limited representability of synthetic datasets without considering any data regularity characteristics. To this end, we propose a novel condensation framework that generates multiple synthetic data with a limited storage budget via efficient parameterization considering data regularity. We further analyze the shortcomings of the existing gradient matching-based condensation methods and develop an effective optimization technique for improving the condensation of training data information. We propose a unified algorithm that drastically improves the quality of condensed data against the current state-of-the-art on CIFAR-10, ImageNet, and Speech Commands.' volume: 162 URL: https://proceedings.mlr.press/v162/kim22c.html PDF: https://proceedings.mlr.press/v162/kim22c/kim22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kim22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jang-Hyun family: Kim - given: Jinuk family: Kim - given: Seong Joon family: Oh - given: Sangdoo family: Yun - given: Hwanjun family: Song - given: Joonhyun family: Jeong - given: Jung-Woo family: Ha - given: Hyun Oh family: Song editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11102-11118 id: kim22c issued: date-parts: - 2022 - 6 - 28 firstpage: 11102 lastpage: 11118 published: 2022-06-28 00:00:00 +0000 - title: 'Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance' abstract: 'We propose Guided-TTS, a high-quality text-to-speech (TTS) model that does not require any transcript of target speaker using classifier guidance. Guided-TTS combines an unconditional diffusion probabilistic model with a separately trained phoneme classifier for classifier guidance. Our unconditional diffusion model learns to generate speech without any context from untranscribed speech data. For TTS synthesis, we guide the generative process of the diffusion model with a phoneme classifier trained on a large-scale speech recognition dataset. We present a norm-based scaling method that reduces the pronunciation errors of classifier guidance in Guided-TTS. We show that Guided-TTS achieves a performance comparable to that of the state-of-the-art TTS model, Grad-TTS, without any transcript for LJSpeech. We further demonstrate that Guided-TTS performs well on diverse datasets including a long-form untranscribed dataset.' volume: 162 URL: https://proceedings.mlr.press/v162/kim22d.html PDF: https://proceedings.mlr.press/v162/kim22d/kim22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kim22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Heeseung family: Kim - given: Sungwon family: Kim - given: Sungroh family: Yoon editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11119-11133 id: kim22d issued: date-parts: - 2022 - 6 - 28 firstpage: 11119 lastpage: 11133 published: 2022-06-28 00:00:00 +0000 - title: 'Variational On-the-Fly Personalization' abstract: 'With the development of deep learning (DL) technologies, the demand for DL-based services on personal devices, such as mobile phones, also increases rapidly. In this paper, we propose a novel personalization method, Variational On-the-Fly Personalization. Compared to the conventional personalization methods that require additional fine-tuning with personal data, the proposed method only requires forwarding a handful of personal data on-the-fly. Assuming even a single personal data can convey the characteristics of a target person, we develop the variational hyper-personalizer to capture the weight distribution of layers that fits the target person. In the testing phase, the hyper-personalizer estimates the model’s weights on-the-fly based on personality by forwarding only a small amount of (even a single) personal enrollment data. Hence, the proposed method can perform the personalization without any training software platform and additional cost in the edge device. In experiments, we show our approach can effectively generate reliable personalized models via forwarding (not back-propagating) a handful of samples.' volume: 162 URL: https://proceedings.mlr.press/v162/kim22e.html PDF: https://proceedings.mlr.press/v162/kim22e/kim22e.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kim22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jangho family: Kim - given: Jun-Tae family: Lee - given: Simyung family: Chang - given: Nojun family: Kwak editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11134-11147 id: kim22e issued: date-parts: - 2022 - 6 - 28 firstpage: 11134 lastpage: 11147 published: 2022-06-28 00:00:00 +0000 - title: 'Fisher SAM: Information Geometry and Sharpness Aware Minimisation' abstract: 'Recent sharpness-aware minimisation (SAM) is known to find flat minima which is beneficial for better generalisation with improved robustness. SAM essentially modifies the loss function by the maximum loss value within the small neighborhood around the current iterate. However, it uses the Euclidean ball to define the neighborhood, which can be less accurate since loss functions for neural networks are typically defined over probability distributions (e.g., class predictive probabilities), rendering the parameter space no more Euclidean. In this paper we consider the information geometry of the model parameter space when defining the neighborhood, namely replacing SAM’s Euclidean balls with ellipsoids induced by the Fisher information. Our approach, dubbed Fisher SAM, defines more accurate neighborhood structures that conform to the intrinsic metric of the underlying statistical manifold. For instance, SAM may probe the worst-case loss value at either a too nearby or inappropriately distant point due to the ignorance of the parameter space geometry, which is avoided by our Fisher SAM. Another recent Adaptive SAM approach that stretches/shrinks the Euclidean ball in accordance with the scales of the parameter magnitudes, might be dangerous, potentially destroying the neighborhood structure even severely. We demonstrate the improved performance of the proposed Fisher SAM on several benchmark datasets/tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/kim22f.html PDF: https://proceedings.mlr.press/v162/kim22f/kim22f.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kim22f.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Minyoung family: Kim - given: Da family: Li - given: Shell X family: Hu - given: Timothy family: Hospedales editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11148-11161 id: kim22f issued: date-parts: - 2022 - 6 - 28 firstpage: 11148 lastpage: 11161 published: 2022-06-28 00:00:00 +0000 - title: 'ViT-NeT: Interpretable Vision Transformers with Neural Tree Decoder' abstract: 'Vision transformers (ViTs), which have demonstrated a state-of-the-art performance in image classification, can also visualize global interpretations through attention-based contributions. However, the complexity of the model makes it difficult to interpret the decision-making process, and the ambiguity of the attention maps can cause incorrect correlations between image patches. In this study, we propose a new ViT neural tree decoder (ViT-NeT). A ViT acts as a backbone, and to solve its limitations, the output contextual image patches are applied to the proposed NeT. The NeT aims to accurately classify fine-grained objects with similar inter-class correlations and different intra-class correlations. In addition, it describes the decision-making process through a tree structure and prototype and enables a visual interpretation of the results. The proposed ViT-NeT is designed to not only improve the classification performance but also provide a human-friendly interpretation, which is effective in resolving the trade-off between performance and interpretability. We compared the performance of ViT-NeT with other state-of-art methods using widely used fine-grained visual categorization benchmark datasets and experimentally proved that the proposed method is superior in terms of the classification performance and interpretability. The code and models are publicly available at https://github.com/jumpsnack/ViT-NeT.' volume: 162 URL: https://proceedings.mlr.press/v162/kim22g.html PDF: https://proceedings.mlr.press/v162/kim22g/kim22g.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kim22g.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sangwon family: Kim - given: Jaeyeal family: Nam - given: Byoung Chul family: Ko editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11162-11172 id: kim22g issued: date-parts: - 2022 - 6 - 28 firstpage: 11162 lastpage: 11172 published: 2022-06-28 00:00:00 +0000 - title: 'Sanity Simulations for Saliency Methods' abstract: 'Saliency methods are a popular class of feature attribution explanation methods that aim to capture a model’s predictive reasoning by identifying "important" pixels in an input image. However, the development and adoption of these methods are hindered by the lack of access to ground-truth model reasoning, which prevents accurate evaluation. In this work, we design a synthetic benchmarking framework, SMERF, that allows us to perform ground-truth-based evaluation while controlling the complexity of the model’s reasoning. Experimentally, SMERF reveals significant limitations in existing saliency methods and, as a result, represents a useful tool for the development of new saliency methods.' volume: 162 URL: https://proceedings.mlr.press/v162/kim22h.html PDF: https://proceedings.mlr.press/v162/kim22h/kim22h.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kim22h.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Joon Sik family: Kim - given: Gregory family: Plumb - given: Ameet family: Talwalkar editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11173-11200 id: kim22h issued: date-parts: - 2022 - 6 - 28 firstpage: 11173 lastpage: 11200 published: 2022-06-28 00:00:00 +0000 - title: 'Soft Truncation: A Universal Training Technique of Score-based Diffusion Model for High Precision Score Estimation' abstract: 'Recent advances in diffusion models bring state-of-the-art performance on image generation tasks. However, empirical results from previous research in diffusion models imply an inverse correlation between density estimation and sample generation performances. This paper investigates with sufficient empirical evidence that such inverse correlation happens because density estimation is significantly contributed by small diffusion time, whereas sample generation mainly depends on large diffusion time. However, training a score network well across the entire diffusion time is demanding because the loss scale is significantly imbalanced at each diffusion time. For successful training, therefore, we introduce Soft Truncation, a universally applicable training technique for diffusion models, that softens the fixed and static truncation hyperparameter into a random variable. In experiments, Soft Truncation achieves state-of-the-art performance on CIFAR-10, CelebA, CelebA-HQ $256\times 256$, and STL-10 datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/kim22i.html PDF: https://proceedings.mlr.press/v162/kim22i/kim22i.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kim22i.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Dongjun family: Kim - given: Seungjae family: Shin - given: Kyungwoo family: Song - given: Wanmo family: Kang - given: Il-Chul family: Moon editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11201-11228 id: kim22i issued: date-parts: - 2022 - 6 - 28 firstpage: 11201 lastpage: 11228 published: 2022-06-28 00:00:00 +0000 - title: 'Rotting Infinitely Many-Armed Bandits' abstract: 'We consider the infinitely many-armed bandit problem with rotting rewards, where the mean reward of an arm decreases at each pull of the arm according to an arbitrary trend with maximum rotting rate $\varrho=o(1)$. We show that this learning problem has an $\Omega(\max\{\varrho^{1/3}T, \sqrt{T}\})$ worst-case regret lower bound where $T$ is the time horizon. We show that a matching upper bound $\tilde{O}(\max\{\varrho^{1/3}T, \sqrt{T}\})$, up to a poly-logarithmic factor, can be achieved by an algorithm that uses a UCB index for each arm and a threshold value to decide whether to continue pulling an arm or remove the arm from further consideration, when the algorithm knows the value of the maximum rotting rate $\varrho$. We also show that an $\tilde{O}(\max\{\varrho^{1/3}T, T^{3/4}\})$ regret upper bound can be achieved by an algorithm that does not know the value of $\varrho$, by using an adaptive UCB index along with an adaptive threshold value.' volume: 162 URL: https://proceedings.mlr.press/v162/kim22j.html PDF: https://proceedings.mlr.press/v162/kim22j/kim22j.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kim22j.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jung-Hun family: Kim - given: Milan family: Vojnovic - given: Se-Young family: Yun editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11229-11254 id: kim22j issued: date-parts: - 2022 - 6 - 28 firstpage: 11229 lastpage: 11254 published: 2022-06-28 00:00:00 +0000 - title: 'Accelerated Gradient Methods for Geodesically Convex Optimization: Tractable Algorithms and Convergence Analysis' abstract: 'We propose computationally tractable accelerated first-order methods for Riemannian optimization, extending the Nesterov accelerated gradient (NAG) method. For both geodesically convex and geodesically strongly convex objective functions, our algorithms are shown to have the same iteration complexities as those for the NAG method on Euclidean spaces, under only standard assumptions. To the best of our knowledge, the proposed scheme is the first fully accelerated method for geodesically convex optimization problems. Our convergence analysis makes use of novel metric distortion lemmas as well as carefully designed potential functions. A connection with the continuous-time dynamics for modeling Riemannian acceleration in (Alimisis et al., 2020) is also identified by letting the stepsize tend to zero. We validate our theoretical results through numerical experiments.' volume: 162 URL: https://proceedings.mlr.press/v162/kim22k.html PDF: https://proceedings.mlr.press/v162/kim22k/kim22k.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kim22k.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jungbin family: Kim - given: Insoon family: Yang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11255-11282 id: kim22k issued: date-parts: - 2022 - 6 - 28 firstpage: 11255 lastpage: 11282 published: 2022-06-28 00:00:00 +0000 - title: 'Generalizing to New Physical Systems via Context-Informed Dynamics Model' abstract: 'Data-driven approaches to modeling physical systems fail to generalize to unseen systems that share the same general dynamics with the learning domain, but correspond to different physical contexts. We propose a new framework for this key problem, context-informed dynamics adaptation (CoDA), which takes into account the distributional shift across systems for fast and efficient adaptation to new dynamics. CoDA leverages multiple environments, each associated to a different dynamic, and learns to condition the dynamics model on contextual parameters, specific to each environment. The conditioning is performed via a hypernetwork, learned jointly with a context vector from observed data. The proposed formulation constrains the search hypothesis space for fast adaptation and better generalization across environments with few samples. We theoretically motivate our approach and show state-of-the-art generalization results on a set of nonlinear dynamics, representative of a variety of application domains. We also show, on these systems, that new system parameters can be inferred from context vectors with minimal supervision.' volume: 162 URL: https://proceedings.mlr.press/v162/kirchmeyer22a.html PDF: https://proceedings.mlr.press/v162/kirchmeyer22a/kirchmeyer22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kirchmeyer22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Matthieu family: Kirchmeyer - given: Yuan family: Yin - given: Jeremie family: Dona - given: Nicolas family: Baskiotis - given: Alain family: Rakotomamonjy - given: Patrick family: Gallinari editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11283-11301 id: kirchmeyer22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11283 lastpage: 11301 published: 2022-06-28 00:00:00 +0000 - title: 'SoQal: Selective Oracle Questioning for Consistency Based Active Learning of Cardiac Signals' abstract: 'Clinical settings are often characterized by abundant unlabelled data and limited labelled data. This is typically driven by the high burden placed on oracles (e.g., physicians) to provide annotations. One way to mitigate this burden is via active learning (AL) which involves the (a) acquisition and (b) annotation of informative unlabelled instances. Whereas previous work addresses either one of these elements independently, we propose an AL framework that addresses both. For acquisition, we propose Bayesian Active Learning by Consistency (BALC), a sub-framework which perturbs both instances and network parameters and quantifies changes in the network output probability distribution. For annotation, we propose SoQal, a sub-framework that dynamically determines whether, for each acquired unlabelled instance, to request a label from an oracle or to pseudo-label it instead. We show that BALC can outperform start-of-the-art acquisition functions such as BALD, and SoQal outperforms baseline methods even in the presence of a noisy oracle.' volume: 162 URL: https://proceedings.mlr.press/v162/kiyasseh22a.html PDF: https://proceedings.mlr.press/v162/kiyasseh22a/kiyasseh22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kiyasseh22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Dani family: Kiyasseh - given: Tingting family: Zhu - given: David A family: Clifton editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11302-11340 id: kiyasseh22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11302 lastpage: 11340 published: 2022-06-28 00:00:00 +0000 - title: 'Curriculum Reinforcement Learning via Constrained Optimal Transport' abstract: 'Curriculum reinforcement learning (CRL) allows solving complex tasks by generating a tailored sequence of learning tasks, starting from easy ones and subsequently increasing their difficulty. Although the potential of curricula in RL has been clearly shown in a variety of works, it is less clear how to generate them for a given learning environment, resulting in a variety of methods aiming to automate this task. In this work, we focus on the idea of framing curricula as interpolations between task distributions, which has previously been shown to be a viable approach to CRL. Identifying key issues of existing methods, we frame the generation of a curriculum as a constrained optimal transport problem between task distributions. Benchmarks show that this way of curriculum generation can improve upon existing CRL methods, yielding high performance in a variety of tasks with different characteristics.' volume: 162 URL: https://proceedings.mlr.press/v162/klink22a.html PDF: https://proceedings.mlr.press/v162/klink22a/klink22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-klink22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Pascal family: Klink - given: Haoyi family: Yang - given: Carlo family: D’Eramo - given: Jan family: Peters - given: Joni family: Pajarinen editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11341-11358 id: klink22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11341 lastpage: 11358 published: 2022-06-28 00:00:00 +0000 - title: 'Exploiting Redundancy: Separable Group Convolutional Networks on Lie Groups' abstract: 'Group convolutional neural networks (G-CNNs) have been shown to increase parameter efficiency and model accuracy by incorporating geometric inductive biases. In this work, we investigate the properties of representations learned by regular G-CNNs, and show considerable parameter redundancy in group convolution kernels. This finding motivates further weight-tying by sharing convolution kernels over subgroups. To this end, we introduce convolution kernels that are separable over the subgroup and channel dimensions. In order to obtain equivariance to arbitrary affine Lie groups we provide a continuous parameterisation of separable convolution kernels. We evaluate our approach across several vision datasets, and show that our weight sharing leads to improved performance and computational efficiency. In many settings, separable G-CNNs outperform their non-separable counterpart, while only using a fraction of their training time. In addition, thanks to the increase in computational efficiency, we are able to implement G-CNNs equivariant to the $\mathrm{Sim(2)}$ group; the group of dilations, rotations and translations of the plane. $\mathrm{Sim(2)}$-equivariance further improves performance on all tasks considered, and achieves state-of-the-art performance on rotated MNIST.' volume: 162 URL: https://proceedings.mlr.press/v162/knigge22a.html PDF: https://proceedings.mlr.press/v162/knigge22a/knigge22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-knigge22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: David M. family: Knigge - given: David W family: Romero - given: Erik J family: Bekkers editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11359-11386 id: knigge22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11359 lastpage: 11386 published: 2022-06-28 00:00:00 +0000 - title: 'Revisiting Contrastive Learning through the Lens of Neighborhood Component Analysis: an Integrated Framework' abstract: 'As a seminal tool in self-supervised representation learning, contrastive learning has gained unprecedented attention in recent years. In essence, contrastive learning aims to leverage pairs of positive and negative samples for representation learning, which relates to exploiting neighborhood information in a feature space. By investigating the connection between contrastive learning and neighborhood component analysis (NCA), we provide a novel stochastic nearest neighbor viewpoint of contrastive learning and subsequently propose a series of contrastive losses that outperform the existing ones. Under our proposed framework, we show a new methodology to design integrated contrastive losses that could simultaneously achieve good accuracy and robustness on downstream tasks. With the integrated framework, we achieve up to 6% improvement on the standard accuracy and 17% improvement on the robust accuracy.' volume: 162 URL: https://proceedings.mlr.press/v162/ko22a.html PDF: https://proceedings.mlr.press/v162/ko22a/ko22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ko22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ching-Yun family: Ko - given: Jeet family: Mohapatra - given: Sijia family: Liu - given: Pin-Yu family: Chen - given: Luca family: Daniel - given: Lily family: Weng editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11387-11412 id: ko22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11387 lastpage: 11412 published: 2022-06-28 00:00:00 +0000 - title: 'Transfer Learning In Differential Privacy’s Hybrid-Model' abstract: 'The hybrid-model (Avent et al 2017) in Differential Privacy is a an augmentation of the local-model where in addition to $N$ local-agents we are assisted by one special agent who is in fact a curator holding the sensitive details of $n$ additional individuals. Here we study the problem of machine learning in the hybrid-model where the $n$ individuals in the curator’s dataset are drawn from a different distribution than the one of the general population (the local-agents). We give a general scheme – Subsample-Test-Reweigh – for this transfer learning problem, which reduces any curator-model learner to a learner in the hybrid-model using iterative subsampling and reweighing of the $n$ examples held by the curator based on a smooth variation (introduced by Bun et al 2020) of the Multiplicative-Weights algorithm. Our scheme has a sample complexity which relies on the $\chi^2$-divergence between the two distributions. We give worst-case analysis bounds on the sample complexity required for our private reduction. Aiming to reduce said sample complexity, we give two specific instances our sample complexity can be drastically reduced (one instance is analyzed mathematically, while the other - empirically) and pose several directions for follow-up work.' volume: 162 URL: https://proceedings.mlr.press/v162/kohen22a.html PDF: https://proceedings.mlr.press/v162/kohen22a/kohen22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kohen22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Refael family: Kohen - given: Or family: Sheffet editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11413-11429 id: kohen22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11413 lastpage: 11429 published: 2022-06-28 00:00:00 +0000 - title: 'Markov Chain Monte Carlo for Continuous-Time Switching Dynamical Systems' abstract: 'Switching dynamical systems are an expressive model class for the analysis of time-series data. As in many fields within the natural and engineering sciences, the systems under study typically evolve continuously in time, it is natural to consider continuous-time model formulations consisting of switching stochastic differential equations governed by an underlying Markov jump process. Inference in these types of models is however notoriously difficult, and tractable computational schemes are rare. In this work, we propose a novel inference algorithm utilizing a Markov Chain Monte Carlo approach. The presented Gibbs sampler allows to efficiently obtain samples from the exact continuous-time posterior processes. Our framework naturally enables Bayesian parameter estimation, and we also include an estimate for the diffusion covariance, which is oftentimes assumed fixed in stochastic differential equations models. We evaluate our framework under the modeling assumption and compare it against an existing variational inference approach.' volume: 162 URL: https://proceedings.mlr.press/v162/kohs22a.html PDF: https://proceedings.mlr.press/v162/kohs22a/kohs22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kohs22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Lukas family: Köhs - given: Bastian family: Alt - given: Heinz family: Koeppl editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11430-11454 id: kohs22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11430 lastpage: 11454 published: 2022-06-28 00:00:00 +0000 - title: 'Partial disentanglement for domain adaptation' abstract: 'Unsupervised domain adaptation is critical to many real-world applications where label information is unavailable in the target domain. In general, without further assumptions, the joint distribution of the features and the label is not identifiable in the target domain. To address this issue, we rely on a property of minimal changes of causal mechanisms across domains to minimize unnecessary influences of domain shift. To encode this property, we first formulate the data generating process using a latent variable model with two partitioned latent subspaces: invariant components whose distributions stay the same across domains, and sparse changing components that vary across domains. We further constrain the domain shift to have a restrictive influence on the changing components. Under mild conditions, we show that the latent variables are partially identifiable, from which it follows that the joint distribution of data and labels in the target domain is also identifiable. Given the theoretical insights, we propose a practical domain adaptation framework, called iMSDA. Extensive experimental results reveal that iMSDA outperforms state-of-the-art domain adaptation algorithms on benchmark datasets, demonstrating the effectiveness of our framework.' volume: 162 URL: https://proceedings.mlr.press/v162/kong22a.html PDF: https://proceedings.mlr.press/v162/kong22a/kong22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kong22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Lingjing family: Kong - given: Shaoan family: Xie - given: Weiran family: Yao - given: Yujia family: Zheng - given: Guangyi family: Chen - given: Petar family: Stojanov - given: Victor family: Akinwande - given: Kun family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11455-11472 id: kong22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11455 lastpage: 11472 published: 2022-06-28 00:00:00 +0000 - title: 'Simultaneously Learning Stochastic and Adversarial Bandits with General Graph Feedback' abstract: 'The problem of online learning with graph feedback has been extensively studied in the literature due to its generality and potential to model various learning tasks. Existing works mainly study the adversarial and stochastic feedback separately. If the prior knowledge of the feedback mechanism is unavailable or wrong, such specially designed algorithms could suffer great loss. To avoid this problem, \citet{erez2021towards} try to optimize for both environments. However, they assume the feedback graphs are undirected and each vertex has a self-loop, which compromises the generality of the framework and may not be satisfied in applications. With a general feedback graph, the observation of an arm may not be available when this arm is pulled, which makes the exploration more expensive and the algorithms more challenging to perform optimally in both environments. In this work, we overcome this difficulty by a new trade-off mechanism with a carefully-designed proportion for exploration and exploitation. We prove the proposed algorithm simultaneously achieves $\mathrm{poly} \log T$ regret in the stochastic setting and minimax-optimal regret of $\tilde{O}(T^{2/3})$ in the adversarial setting where $T$ is the horizon and $\tilde{O}$ hides parameters independent of $T$ as well as logarithmic terms. To our knowledge, this is the first best-of-both-worlds result for general feedback graphs.' volume: 162 URL: https://proceedings.mlr.press/v162/kong22b.html PDF: https://proceedings.mlr.press/v162/kong22b/kong22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kong22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Fang family: Kong - given: Yichi family: Zhou - given: Shuai family: Li editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11473-11482 id: kong22b issued: date-parts: - 2022 - 6 - 28 firstpage: 11473 lastpage: 11482 published: 2022-06-28 00:00:00 +0000 - title: 'Adaptive Data Analysis with Correlated Observations' abstract: 'The vast majority of the work on adaptive data analysis focuses on the case where the samples in the dataset are independent. Several approaches and tools have been successfully applied in this context, such as differential privacy, max-information, compression arguments, and more. The situation is far less well-understood without the independence assumption. We embark on a systematic study of the possibilities of adaptive data analysis with correlated observations. First, we show that, in some cases, differential privacy guarantees generalization even when there are dependencies within the sample, which we quantify using a notion we call Gibbs-dependence. We complement this result with a tight negative example. % Second, we show that the connection between transcript-compression and adaptive data analysis can be extended to the non-iid setting.' volume: 162 URL: https://proceedings.mlr.press/v162/kontorovich22a.html PDF: https://proceedings.mlr.press/v162/kontorovich22a/kontorovich22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kontorovich22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Aryeh family: Kontorovich - given: Menachem family: Sadigurschi - given: Uri family: Stemmer editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11483-11498 id: kontorovich22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11483 lastpage: 11498 published: 2022-06-28 00:00:00 +0000 - title: 'Controlling Conditional Language Models without Catastrophic Forgetting' abstract: 'Machine learning is shifting towards general-purpose pretrained generative models, trained in a self-supervised manner on large amounts of data, which can then be applied to solve a large number of tasks. However, due to their generic training methodology, these models often fail to meet some of the downstream requirements (e.g., hallucinations in abstractive summarization or style violations in code generation). This raises the important question of how to adapt pre-trained generative models to meet all requirements without destroying their general capabilities ("catastrophic forgetting"). Recent work has proposed to solve this problem by representing task-specific requirements through energy-based models (EBMs) and approximating these EBMs using distributional policy gradients (DPG). Despite its effectiveness, this approach is however limited to unconditional distributions. In this paper, we extend DPG to conditional tasks by proposing Conditional DPG (CDPG). We evaluate CDPG on four different control objectives across three tasks (translation, summarization and code generation) and two pretrained models (T5 and GPT-Neo). Our results show that fine-tuning using CDPG robustly moves these pretrained models closer towards meeting control objectives and — in contrast with baseline approaches — does not result in catastrophic forgetting.' volume: 162 URL: https://proceedings.mlr.press/v162/korbak22a.html PDF: https://proceedings.mlr.press/v162/korbak22a/korbak22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-korbak22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tomasz family: Korbak - given: Hady family: Elsahar - given: German family: Kruszewski - given: Marc family: Dymetman editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11499-11528 id: korbak22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11499 lastpage: 11528 published: 2022-06-28 00:00:00 +0000 - title: 'Batch Greenkhorn Algorithm for Entropic-Regularized Multimarginal Optimal Transport: Linear Rate of Convergence and Iteration Complexity' abstract: 'In this work we propose a batch multimarginal version of the Greenkhorn algorithm for the entropic-regularized optimal transport problem. This framework is general enough to cover, as particular cases, existing Sinkhorn and Greenkhorn algorithms for the bi-marginal setting, and greedy MultiSinkhorn for the general multimarginal case. We provide a comprehensive convergence analysis based on the properties of the iterative Bregman projections method with greedy control. Linear rate of convergence as well as explicit bounds on the iteration complexity are obtained. When specialized to the above mentioned algorithms, our results give new convergence rates or provide key improvements over the state-of-the-art rates. We present numerical experiments showing that the flexibility of the batch can be exploited to improve performance of Sinkhorn algorithm both in bi-marginal and multimarginal settings.' volume: 162 URL: https://proceedings.mlr.press/v162/kostic22a.html PDF: https://proceedings.mlr.press/v162/kostic22a/kostic22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kostic22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Vladimir R. family: Kostic - given: Saverio family: Salzo - given: Massimiliano family: Pontil editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11529-11558 id: kostic22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11529 lastpage: 11558 published: 2022-06-28 00:00:00 +0000 - title: 'Certified Adversarial Robustness Under the Bounded Support Set' abstract: 'Deep neural networks (DNNs) have revealed severe vulnerability to adversarial perturbations, beside empirical adversarial training for robustness, the design of provably robust classifiers attracts more and more attention. Randomized smoothing methods provide the certified robustness with agnostic architecture, which is further extended to a provable robustness framework using f-divergence. While these methods cannot be applied to smoothing measures with bounded support set such as uniform probability measure due to the use of likelihood ratio in their certification methods. In this paper, we generalize the $f$-divergence-based framework to a Wasserstein-distance-based and total-variation-distance-based framework that is first able to analyze robustness properties of bounded support set smoothing measures both theoretically and experimentally. By applying our methodology to uniform probability measures with support set $l_p (p=1,2,\infty\text{ and general})$ ball, we prove negative certified robustness properties with respect to $l_q (q=1, 2, \infty)$ perturbations and present experimental results on CIFAR-10 dataset with ResNet to validate our theory. And it is also worth mentioning that our certification procedure only costs constant computation time.' volume: 162 URL: https://proceedings.mlr.press/v162/kou22a.html PDF: https://proceedings.mlr.press/v162/kou22a/kou22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kou22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yiwen family: Kou - given: Qinyuan family: Zheng - given: Yisen family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11559-11597 id: kou22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11559 lastpage: 11597 published: 2022-06-28 00:00:00 +0000 - title: 'Exact Learning of Preference Structure: Single-peaked Preferences and Beyond' abstract: 'We consider the setting where the members of a society (voters) have preferences over candidates, and the candidates can be ordered on an axis so that the voters’ preferences are single-peaked on this axis. We ask whether this axis can be identified by sampling the voters’ preferences. For several natural distributions, we obtain tight bounds on the number of samples required and show that, surprisingly, the bounds are independent of the number of candidates. We extend our results to the case where voters’ preferences are sampled from two different axes over the same candidate set (one of which may be known). We also consider two alternative models of learning: (1) sampling pairwise comparisons rather than entire votes, and (2) learning from equivalence queries.' volume: 162 URL: https://proceedings.mlr.press/v162/kraiczy22a.html PDF: https://proceedings.mlr.press/v162/kraiczy22a/kraiczy22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kraiczy22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sonja family: Kraiczy - given: Edith family: Elkind editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11598-11612 id: kraiczy22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11598 lastpage: 11612 published: 2022-06-28 00:00:00 +0000 - title: 'Reconstructing Nonlinear Dynamical Systems from Multi-Modal Time Series' abstract: 'Empirically observed time series in physics, biology, or medicine, are commonly generated by some underlying dynamical system (DS) which is the target of scientific interest. There is an increasing interest to harvest machine learning methods to reconstruct this latent DS in a data-driven, unsupervised way. In many areas of science it is common to sample time series observations from many data modalities simultaneously, e.g. electrophysiological and behavioral time series in a typical neuroscience experiment. However, current machine learning tools for reconstructing DSs usually focus on just one data modality. Here we propose a general framework for multi-modal data integration for the purpose of nonlinear DS reconstruction and the analysis of cross-modal relations. This framework is based on dynamically interpretable recurrent neural networks as general approximators of nonlinear DSs, coupled to sets of modality-specific decoder models from the class of generalized linear models. Both an expectation-maximization and a variational inference algorithm for model training are advanced and compared. We show on nonlinear DS benchmarks that our algorithms can efficiently compensate for too noisy or missing information in one data channel by exploiting other channels, and demonstrate on experimental neuroscience data how the algorithm learns to link different data domains to the underlying dynamics.' volume: 162 URL: https://proceedings.mlr.press/v162/kramer22a.html PDF: https://proceedings.mlr.press/v162/kramer22a/kramer22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kramer22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Daniel family: Kramer - given: Philine L family: Bommer - given: Carlo family: Tombolini - given: Georgia family: Koppe - given: Daniel family: Durstewitz editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11613-11633 id: kramer22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11613 lastpage: 11633 published: 2022-06-28 00:00:00 +0000 - title: 'Probabilistic ODE Solutions in Millions of Dimensions' abstract: 'Probabilistic solvers for ordinary differential equations (ODEs) have emerged as an efficient framework for uncertainty quantification and inference on dynamical systems. In this work, we explain the mathematical assumptions and detailed implementation schemes behind solving high-dimensional ODEs with a probabilistic numerical algorithm. This has not been possible before due to matrix-matrix operations in each solver step, but is crucial for scientifically relevant problems—most importantly, the solution of discretised partial differential equations. In a nutshell, efficient high-dimensional probabilistic ODE solutions build either on independence assumptions or on Kronecker structure in the prior model. We evaluate the resulting efficiency on a range of problems, including the probabilistic numerical simulation of a differential equation with millions of dimensions.' volume: 162 URL: https://proceedings.mlr.press/v162/kramer22b.html PDF: https://proceedings.mlr.press/v162/kramer22b/kramer22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kramer22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Nicholas family: Krämer - given: Nathanael family: Bosch - given: Jonathan family: Schmidt - given: Philipp family: Hennig editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11634-11649 id: kramer22b issued: date-parts: - 2022 - 6 - 28 firstpage: 11634 lastpage: 11649 published: 2022-06-28 00:00:00 +0000 - title: 'Active Nearest Neighbor Regression Through Delaunay Refinement' abstract: 'We introduce an algorithm for active function approximation based on nearest neighbor regression. Our Active Nearest Neighbor Regressor (ANNR) relies on the Voronoi-Delaunay framework from computational geometry to subdivide the space into cells with constant estimated function value and select novel query points in a way that takes the geometry of the function graph into account. We consider the recent state-of-the-art active function approximator called DEFER, which is based on incremental rectangular partitioning of the space, as the main baseline. The ANNR addresses a number of limitations that arise from the space subdivision strategy used in DEFER. We provide a computationally efficient implementation of our method, as well as theoretical halting guarantees. Empirical results show that ANNR outperforms the baseline for both closed-form functions and real-world examples, such as gravitational wave parameter inference and exploration of the latent space of a generative model.' volume: 162 URL: https://proceedings.mlr.press/v162/kravberg22a.html PDF: https://proceedings.mlr.press/v162/kravberg22a/kravberg22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kravberg22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Alexander family: Kravberg - given: Giovanni Luca family: Marchetti - given: Vladislav family: Polianskii - given: Anastasiia family: Varava - given: Florian T. family: Pokorny - given: Danica family: Kragic editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11650-11664 id: kravberg22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11650 lastpage: 11664 published: 2022-06-28 00:00:00 +0000 - title: 'Functional Generalized Empirical Likelihood Estimation for Conditional Moment Restrictions' abstract: 'Important problems in causal inference, economics, and, more generally, robust machine learning can be expressed as conditional moment restrictions, but estimation becomes challenging as it requires solving a continuum of unconditional moment restrictions. Previous works addressed this problem by extending the generalized method of moments (GMM) to continuum moment restrictions. In contrast, generalized empirical likelihood (GEL) provides a more general framework and has been shown to enjoy favorable small-sample properties compared to GMM-based estimators. To benefit from recent developments in machine learning, we provide a functional reformulation of GEL in which arbitrary models can be leveraged. Motivated by a dual formulation of the resulting infinite dimensional optimization problem, we devise a practical method and explore its asymptotic properties. Finally, we provide kernel- and neural network-based implementations of the estimator, which achieve state-of-the-art empirical performance on two conditional moment restriction problems.' volume: 162 URL: https://proceedings.mlr.press/v162/kremer22a.html PDF: https://proceedings.mlr.press/v162/kremer22a/kremer22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kremer22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Heiner family: Kremer - given: Jia-Jie family: Zhu - given: Krikamol family: Muandet - given: Bernhard family: Schölkopf editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11665-11682 id: kremer22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11665 lastpage: 11682 published: 2022-06-28 00:00:00 +0000 - title: 'Calibrated and Sharp Uncertainties in Deep Learning via Density Estimation' abstract: 'Accurate probabilistic predictions can be characterized by two properties{—}calibration and sharpness. However, standard maximum likelihood training yields models that are poorly calibrated and thus inaccurate{—}a 90% confidence interval typically does not contain the true outcome 90% of the time. This paper argues that calibration is important in practice and is easy to maintain by performing low-dimensional density estimation. We introduce a simple training procedure based on recalibration that yields calibrated models without sacrificing overall performance; unlike previous approaches, ours ensures the most general property of distribution calibration and applies to any model, including neural networks. We formally prove the correctness of our procedure assuming that we can estimate densities in low dimensions and we establish uniform convergence bounds. Our results yield empirical performance improvements on linear and deep Bayesian models and suggest that calibration should be increasingly leveraged across machine learning.' volume: 162 URL: https://proceedings.mlr.press/v162/kuleshov22a.html PDF: https://proceedings.mlr.press/v162/kuleshov22a/kuleshov22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kuleshov22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Volodymyr family: Kuleshov - given: Shachi family: Deshpande editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11683-11693 id: kuleshov22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11683 lastpage: 11693 published: 2022-06-28 00:00:00 +0000 - title: 'ActiveHedge: Hedge meets Active Learning' abstract: 'We consider the classical problem of multi-class prediction with expert advice, but with an active learning twist. In this new setting the learner will only query the labels of a small number of examples, but still aims to minimize regret to the best expert as usual; the learner is also allowed a very short "burn-in" phase where it can fast-forward and query certain highly-informative examples. We design an algorithm that utilizes Hedge (aka Exponential Weights) as a subroutine, and we show that under a very particular combinatorial constraint on the matrix of expert predictions we can obtain a very strong regret guarantee while querying very few labels. This constraint, which we refer to as $\zeta$-compactness, or just compactness, can be viewed as a non-stochastic variant of the disagreement coefficient, another popular parameter used to reason about the sample complexity of active learning in the IID setting. We also give a polynomial-time algorithm to calculate the $\zeta$-compactness of a matrix up to an approximation factor of 3.' volume: 162 URL: https://proceedings.mlr.press/v162/kumar22a.html PDF: https://proceedings.mlr.press/v162/kumar22a/kumar22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kumar22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Bhuvesh family: Kumar - given: Jacob D family: Abernethy - given: Venkatesh family: Saligrama editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11694-11709 id: kumar22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11694 lastpage: 11709 published: 2022-06-28 00:00:00 +0000 - title: 'Balancing Discriminability and Transferability for Source-Free Domain Adaptation' abstract: 'Conventional domain adaptation (DA) techniques aim to improve domain transferability by learning domain-invariant representations; while concurrently preserving the task-discriminability knowledge gathered from the labeled source data. However, the requirement of simultaneous access to labeled source and unlabeled target renders them unsuitable for the challenging source-free DA setting. The trivial solution of realizing an effective original to generic domain mapping improves transferability but degrades task discriminability. Upon analyzing the hurdles from both theoretical and empirical standpoints, we derive novel insights to show that a mixup between original and corresponding translated generic samples enhances the discriminability-transferability trade-off while duly respecting the privacy-oriented source-free setting. A simple but effective realization of the proposed insights on top of the existing source-free DA approaches yields state-of-the-art performance with faster convergence. Beyond single-source, we also outperform multi-source prior-arts across both classification and semantic segmentation benchmarks.' volume: 162 URL: https://proceedings.mlr.press/v162/kundu22a.html PDF: https://proceedings.mlr.press/v162/kundu22a/kundu22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kundu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jogendra Nath family: Kundu - given: Akshay R family: Kulkarni - given: Suvaansh family: Bhambri - given: Deepesh family: Mehta - given: Shreyas Anand family: Kulkarni - given: Varun family: Jampani - given: Venkatesh Babu family: Radhakrishnan editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11710-11728 id: kundu22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11710 lastpage: 11728 published: 2022-06-28 00:00:00 +0000 - title: 'Showing Your Offline Reinforcement Learning Work: Online Evaluation Budget Matters' abstract: 'In this work, we argue for the importance of an online evaluation budget for a reliable comparison of deep offline RL algorithms. First, we delineate that the online evaluation budget is problem-dependent, where some problems allow for less but others for more. And second, we demonstrate that the preference between algorithms is budget-dependent across a diverse range of decision-making domains such as Robotics, Finance, and Energy Management. Following the points above, we suggest reporting the performance of deep offline RL algorithms under varying online evaluation budgets. To facilitate this, we propose to use a reporting tool from the NLP field, Expected Validation Performance. This technique makes it possible to reliably estimate expected maximum performance under different budgets while not requiring any additional computation beyond hyperparameter search. By employing this tool, we also show that Behavioral Cloning is often more favorable to offline RL algorithms when working within a limited budget.' volume: 162 URL: https://proceedings.mlr.press/v162/kurenkov22a.html PDF: https://proceedings.mlr.press/v162/kurenkov22a/kurenkov22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kurenkov22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Vladislav family: Kurenkov - given: Sergey family: Kolesnikov editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11729-11752 id: kurenkov22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11729 lastpage: 11752 published: 2022-06-28 00:00:00 +0000 - title: 'Equivariant Priors for compressed sensing with unknown orientation' abstract: 'In compressed sensing, the goal is to reconstruct the signal from an underdetermined system of linear measurements. Thus, prior knowledge about the signal of interest and its structure is required. Additionally, in many scenarios, the signal has an unknown orientation prior to measurements. To address such recovery problems, we propose using equivariant generative models as a prior, which encapsulate orientation information in their latent space. Thereby, we show that signals with unknown orientations can be recovered with iterative gradient descent on the latent space of these models and provide additional theoretical recovery guarantees. We construct an equivariant variational autoencoder and use the decoder as generative prior for compressed sensing. We discuss additional potential gains of the proposed approach in terms of convergence and latency.' volume: 162 URL: https://proceedings.mlr.press/v162/kuzina22a.html PDF: https://proceedings.mlr.press/v162/kuzina22a/kuzina22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kuzina22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Anna family: Kuzina - given: Kumar family: Pratik - given: Fabio Valerio family: Massoli - given: Arash family: Behboodi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11753-11771 id: kuzina22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11753 lastpage: 11771 published: 2022-06-28 00:00:00 +0000 - title: 'Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms' abstract: 'Motivated by online recommendation systems, we propose the problem of finding the optimal policy in multitask contextual bandits when a small fraction $\alpha < 1/2$ of tasks (users) are arbitrary and adversarial. The remaining fraction of good users share the same instance of contextual bandits with $S$ contexts and $A$ actions (items). Naturally, whether a user is good or adversarial is not known in advance. The goal is to robustly learn the policy that maximizes rewards for good users with as few user interactions as possible. Without adversarial users, established results in collaborative filtering show that $O(1/\epsilon^2)$ per-user interactions suffice to learn a good policy, precisely because information can be shared across users. This parallelization gain is fundamentally altered by the presence of adversarial users: unless there are super-polynomial number of users, we show a lower bound of $\tilde{\Omega}(\min(S,A) \cdot \alpha^2 / \epsilon^2)$ per-user interactions to learn an $\epsilon$-optimal policy for the good users. We then show we can achieve an $\tilde{O}(\min(S,A)\cdot \alpha/\epsilon^2)$ upper-bound, by employing efficient robust mean estimators for both uni-variate and high-dimensional random variables. We also show that this can be improved depending on the distributions of contexts.' volume: 162 URL: https://proceedings.mlr.press/v162/kwon22a.html PDF: https://proceedings.mlr.press/v162/kwon22a/kwon22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-kwon22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jeongyeol family: Kwon - given: Yonathan family: Efroni - given: Constantine family: Caramanis - given: Shie family: Mannor editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11772-11789 id: kwon22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11772 lastpage: 11789 published: 2022-06-28 00:00:00 +0000 - title: 'Large Batch Experience Replay' abstract: 'Several algorithms have been proposed to sample non-uniformly the replay buffer of deep Reinforcement Learning (RL) agents to speed-up learning, but very few theoretical foundations of these sampling schemes have been provided. Among others, Prioritized Experience Replay appears as a hyperparameter sensitive heuristic, even though it can provide good performance. In this work, we cast the replay buffer sampling problem as an importance sampling one for estimating the gradient. This allows deriving the theoretically optimal sampling distribution, yielding the best theoretical convergence speed. Elaborating on the knowledge of the ideal sampling scheme, we exhibit new theoretical foundations of Prioritized Experience Replay. The optimal sampling distribution being intractable, we make several approximations providing good results in practice and introduce, among others, LaBER (Large Batch Experience Replay), an easy-to-code and efficient method for sampling the replay buffer. LaBER, which can be combined with Deep Q-Networks, distributional RL agents or actor-critic methods, yields improved performance over a diverse range of Atari games and PyBullet environments, compared to the base agent it is implemented on and to other prioritization schemes.' volume: 162 URL: https://proceedings.mlr.press/v162/lahire22a.html PDF: https://proceedings.mlr.press/v162/lahire22a/lahire22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lahire22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Thibault family: Lahire - given: Matthieu family: Geist - given: Emmanuel family: Rachelson editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11790-11813 id: lahire22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11790 lastpage: 11813 published: 2022-06-28 00:00:00 +0000 - title: 'FedScale: Benchmarking Model and System Performance of Federated Learning at Scale' abstract: 'We present FedScale, a federated learning (FL) benchmarking suite with realistic datasets and a scalable runtime to enable reproducible FL research. FedScale datasets encompass a wide range of critical FL tasks, ranging from image classification and object detection to language modeling and speech recognition. Each dataset comes with a unified evaluation protocol using real-world data splits and evaluation metrics. To reproduce realistic FL behavior, FedScale contains a scalable and extensible runtime. It provides high-level APIs to implement FL algorithms, deploy them at scale across diverse hardware and software backends, and evaluate them at scale, all with minimal developer efforts. We combine the two to perform systematic benchmarking experiments and highlight potential opportunities for heterogeneity-aware co-optimizations in FL. FedScale is open-source and actively maintained by contributors from different institutions at http://fedscale.ai. We welcome feedback and contributions from the community.' volume: 162 URL: https://proceedings.mlr.press/v162/lai22a.html PDF: https://proceedings.mlr.press/v162/lai22a/lai22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lai22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Fan family: Lai - given: Yinwei family: Dai - given: Sanjay family: Singapuram - given: Jiachen family: Liu - given: Xiangfeng family: Zhu - given: Harsha family: Madhyastha - given: Mosharaf family: Chowdhury editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11814-11827 id: lai22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11814 lastpage: 11827 published: 2022-06-28 00:00:00 +0000 - title: 'Smoothed Adaptive Weighting for Imbalanced Semi-Supervised Learning: Improve Reliability Against Unknown Distribution Data' abstract: 'Despite recent promising results on semi-supervised learning (SSL), data imbalance, particularly in the unlabeled dataset, could significantly impact the training performance of a SSL algorithm if there is a mismatch between the expected and actual class distributions. The efforts on how to construct a robust SSL framework that can effectively learn from datasets with unknown distributions remain limited. We first investigate the feasibility of adding weights to the consistency loss and then we verify the necessity of smoothed weighting schemes. Based on this study, we propose a self-adaptive algorithm, named Smoothed Adaptive Weighting (SAW). SAW is designed to enhance the robustness of SSL by estimating the learning difficulty of each class and synthesizing the weights in the consistency loss based on such estimation. We show that SAW can complement recent consistency-based SSL algorithms and improve their reliability on various datasets including three standard datasets and one gigapixel medical imaging application without making any assumptions about the distribution of the unlabeled set.' volume: 162 URL: https://proceedings.mlr.press/v162/lai22b.html PDF: https://proceedings.mlr.press/v162/lai22b/lai22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lai22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhengfeng family: Lai - given: Chao family: Wang - given: Henrry family: Gunawan - given: Sen-Ching S family: Cheung - given: Chen-Nee family: Chuah editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11828-11843 id: lai22b issued: date-parts: - 2022 - 6 - 28 firstpage: 11828 lastpage: 11843 published: 2022-06-28 00:00:00 +0000 - title: 'Functional Output Regression with Infimal Convolution: Exploring the Huber and $ε$-insensitive Losses' abstract: 'The focus of the paper is functional output regression (FOR) with convoluted losses. While most existing work consider the square loss setting, we leverage extensions of the Huber and the $\epsilon$-insensitive loss (induced by infimal convolution) and propose a flexible framework capable of handling various forms of outliers and sparsity in the FOR family. We derive computationally tractable algorithms relying on duality to tackle the resulting tasks in the context of vector-valued reproducing kernel Hilbert spaces. The efficiency of the approach is demonstrated and contrasted with the classical squared loss setting on both synthetic and real-world benchmarks.' volume: 162 URL: https://proceedings.mlr.press/v162/lambert22a.html PDF: https://proceedings.mlr.press/v162/lambert22a/lambert22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lambert22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Alex family: Lambert - given: Dimitri family: Bouche - given: Zoltan family: Szabo - given: Florence family: D’Alché-Buc editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11844-11867 id: lambert22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11844 lastpage: 11867 published: 2022-06-28 00:00:00 +0000 - title: 'Tell me why! Explanations support learning relational and causal structure' abstract: 'Inferring the abstract relational and causal structure of the world is a major challenge for reinforcement-learning (RL) agents. For humans, language{—}particularly in the form of explanations{—}plays a considerable role in overcoming this challenge. Here, we show that language can play a similar role for deep RL agents in complex environments. While agents typically struggle to acquire relational and causal knowledge, augmenting their experience by training them to predict language descriptions and explanations can overcome these limitations. We show that language can help agents learn challenging relational tasks, and examine which aspects of language contribute to its benefits. We then show that explanations can help agents to infer not only relational but also causal structure. Language can shape the way that agents to generalize out-of-distribution from ambiguous, causally-confounded training, and explanations even allow agents to learn to perform experimental interventions to identify causal relationships. Our results suggest that language description and explanation may be powerful tools for improving agent learning and generalization.' volume: 162 URL: https://proceedings.mlr.press/v162/lampinen22a.html PDF: https://proceedings.mlr.press/v162/lampinen22a/lampinen22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lampinen22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Andrew K family: Lampinen - given: Nicholas family: Roy - given: Ishita family: Dasgupta - given: Stephanie Cy family: Chan - given: Allison family: Tam - given: James family: Mcclelland - given: Chen family: Yan - given: Adam family: Santoro - given: Neil C family: Rabinowitz - given: Jane family: Wang - given: Felix family: Hill editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11868-11890 id: lampinen22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11868 lastpage: 11890 published: 2022-06-28 00:00:00 +0000 - title: 'Generative Cooperative Networks for Natural Language Generation' abstract: 'Generative Adversarial Networks (GANs) have known a tremendous success for many continuous generation tasks, especially in the field of image generation. However, for discrete outputs such as language, optimizing GANs remains an open problem with many instabilities, as no gradient can be properly back-propagated from the discriminator output to the generator parameters. An alternative is to learn the generator network via reinforcement learning, using the discriminator signal as a reward, but such a technique suffers from moving rewards and vanishing gradient problems. Finally, it often falls short compared to direct maximum-likelihood approaches. In this paper, we introduce Generative Cooperative Networks, in which the discriminator architecture is cooperatively used along with the generation policy to output samples of realistic texts for the task at hand. We give theoretical guarantees of convergence for our approach, and study various efficient decoding schemes to empirically achieve state-of-the-art results in two main NLG tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/lamprier22a.html PDF: https://proceedings.mlr.press/v162/lamprier22a/lamprier22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lamprier22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sylvain family: Lamprier - given: Thomas family: Scialom - given: Antoine family: Chaffin - given: Vincent family: Claveau - given: Ewa family: Kijak - given: Jacopo family: Staiano - given: Benjamin family: Piwowarski editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11891-11905 id: lamprier22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11891 lastpage: 11905 published: 2022-06-28 00:00:00 +0000 - title: 'DSTAGNN: Dynamic Spatial-Temporal Aware Graph Neural Network for Traffic Flow Forecasting' abstract: 'As a typical problem in time series analysis, traffic flow prediction is one of the most important application fields of machine learning. However, achieving highly accurate traffic flow prediction is a challenging task, due to the presence of complex dynamic spatial-temporal dependencies within a road network. This paper proposes a novel Dynamic Spatial-Temporal Aware Graph Neural Network (DSTAGNN) to model the complex spatial-temporal interaction in road network. First, considering the fact that historical data carries intrinsic dynamic information about the spatial structure of road networks, we propose a new dynamic spatial-temporal aware graph based on a data-driven strategy to replace the pre-defined static graph usually used in traditional graph convolution. Second, we design a novel graph neural network architecture, which can not only represent dynamic spatial relevance among nodes with an improved multi-head attention mechanism, but also acquire the wide range of dynamic temporal dependency from multi-receptive field features via multi-scale gated convolution. Extensive experiments on real-world data sets demonstrate that our proposed method significantly outperforms the state-of-the-art methods.' volume: 162 URL: https://proceedings.mlr.press/v162/lan22a.html PDF: https://proceedings.mlr.press/v162/lan22a/lan22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lan22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Shiyong family: Lan - given: Yitong family: Ma - given: Weikang family: Huang - given: Wenwu family: Wang - given: Hongyu family: Yang - given: Pyang family: Li editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11906-11917 id: lan22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11906 lastpage: 11917 published: 2022-06-28 00:00:00 +0000 - title: 'Cooperative Online Learning in Stochastic and Adversarial MDPs' abstract: 'We study cooperative online learning in stochastic and adversarial Markov decision process (MDP). That is, in each episode, $m$ agents interact with an MDP simultaneously and share information in order to minimize their individual regret. We consider environments with two types of randomness: fresh – where each agent’s trajectory is sampled i.i.d, and non-fresh – where the realization is shared by all agents (but each agent’s trajectory is also affected by its own actions). More precisely, with non-fresh randomness the realization of every cost and transition is fixed at the start of each episode, and agents that take the same action in the same state at the same time observe the same cost and next state. We thoroughly analyze all relevant settings, highlight the challenges and differences between the models, and prove nearly-matching regret lower and upper bounds. To our knowledge, we are the first to consider cooperative reinforcement learning (RL) with either non-fresh randomness or in adversarial MDPs.' volume: 162 URL: https://proceedings.mlr.press/v162/lancewicki22a.html PDF: https://proceedings.mlr.press/v162/lancewicki22a/lancewicki22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lancewicki22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tal family: Lancewicki - given: Aviv family: Rosenberg - given: Yishay family: Mansour editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11918-11968 id: lancewicki22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11918 lastpage: 11968 published: 2022-06-28 00:00:00 +0000 - title: 'PINs: Progressive Implicit Networks for Multi-Scale Neural Representations' abstract: 'Multi-layer perceptrons (MLP) have proven to be effective scene encoders when combined with higher-dimensional projections of the input, commonly referred to as positional encoding. However, scenes with a wide frequency spectrum remain a challenge: choosing high frequencies for positional encoding introduces noise in low structure areas, while low frequencies results in poor fitting of detailed regions. To address this, we propose a progressive positional encoding, exposing a hierarchical MLP structure to incremental sets of frequency encodings. Our model accurately reconstructs scenes with wide frequency bands and learns a scene representation at progressive level of detail without explicit per-level supervision. The architecture is modular: each level encodes a continuous implicit representation that can be leveraged separately for its respective resolution, meaning a smaller network for coarser reconstructions. Experiments on several 2D and 3D datasets shows improvements in reconstruction accuracy, representational capacity and training speed compared to baselines.' volume: 162 URL: https://proceedings.mlr.press/v162/landgraf22a.html PDF: https://proceedings.mlr.press/v162/landgraf22a/landgraf22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-landgraf22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zoe family: Landgraf - given: Alexander Sorkine family: Hornung - given: Ricardo S family: Cabral editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11969-11984 id: landgraf22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11969 lastpage: 11984 published: 2022-06-28 00:00:00 +0000 - title: 'Co-training Improves Prompt-based Learning for Large Language Models' abstract: 'We demonstrate that co-training (Blum & Mitchell, 1998) can improve the performance of prompt-based learning by using unlabeled data. While prompting has emerged as a promising paradigm for few-shot and zero-shot learning, it is often brittle and requires much larger models compared to the standard supervised setup. We find that co-training makes it possible to improve the original prompt model and at the same time learn a smaller, downstream task-specific model. In the case where we only have partial access to a prompt model (e.g., output probabilities from GPT-3 (Brown et al., 2020)) we learn a calibration model over the prompt outputs. When we have full access to the prompt model’s gradients but full finetuning remains prohibitively expensive (e.g., T0 (Sanh et al., 2021)), we learn a set of soft prompt continuous vectors to iteratively update the prompt model. We find that models trained in this manner can significantly improve performance on challenging datasets where there is currently a large gap between prompt-based learning and fully-supervised models.' volume: 162 URL: https://proceedings.mlr.press/v162/lang22a.html PDF: https://proceedings.mlr.press/v162/lang22a/lang22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lang22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hunter family: Lang - given: Monica N family: Agrawal - given: Yoon family: Kim - given: David family: Sontag editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 11985-12003 id: lang22a issued: date-parts: - 2022 - 6 - 28 firstpage: 11985 lastpage: 12003 published: 2022-06-28 00:00:00 +0000 - title: 'Goal Misgeneralization in Deep Reinforcement Learning' abstract: 'We study goal misgeneralization, a type of out-of-distribution robustness failure in reinforcement learning (RL). Goal misgeneralization occurs when an RL agent retains its capabilities out-of-distribution yet pursues the wrong goal. For instance, an agent might continue to competently avoid obstacles, but navigate to the wrong place. In contrast, previous works have typically focused on capability generalization failures, where an agent fails to do anything sensible at test time.We provide the first explicit empirical demonstrations of goal misgeneralization and present a partial characterization of its causes.' volume: 162 URL: https://proceedings.mlr.press/v162/langosco22a.html PDF: https://proceedings.mlr.press/v162/langosco22a/langosco22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-langosco22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Lauro Langosco Di family: Langosco - given: Jack family: Koch - given: Lee D family: Sharkey - given: Jacob family: Pfau - given: David family: Krueger editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12004-12019 id: langosco22a issued: date-parts: - 2022 - 6 - 28 firstpage: 12004 lastpage: 12019 published: 2022-06-28 00:00:00 +0000 - title: 'Marginal Tail-Adaptive Normalizing Flows' abstract: 'Learning the tail behavior of a distribution is a notoriously difficult problem. By definition, the number of samples from the tail is small, and deep generative models, such as normalizing flows, tend to concentrate on learning the body of the distribution. In this paper, we focus on improving the ability of normalizing flows to correctly capture the tail behavior and, thus, form more accurate models. We prove that the marginal tailedness of an autoregressive flow can be controlled via the tailedness of the marginals of its base distribution. This theoretical insight leads us to a novel type of flows based on flexible base distributions and data-driven linear layers. An empirical analysis shows that the proposed method improves on the accuracy{—}especially on the tails of the distribution{—}and is able to generate heavy-tailed data. We demonstrate its application on a weather and climate example, in which capturing the tail behavior is essential.' volume: 162 URL: https://proceedings.mlr.press/v162/laszkiewicz22a.html PDF: https://proceedings.mlr.press/v162/laszkiewicz22a/laszkiewicz22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-laszkiewicz22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mike family: Laszkiewicz - given: Johannes family: Lederer - given: Asja family: Fischer editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12020-12048 id: laszkiewicz22a issued: date-parts: - 2022 - 6 - 28 firstpage: 12020 lastpage: 12048 published: 2022-06-28 00:00:00 +0000 - title: 'Bregman Proximal Langevin Monte Carlo via Bregman-Moreau Envelopes' abstract: 'We propose efficient Langevin Monte Carlo algorithms for sampling distributions with nonsmooth convex composite potentials, which is the sum of a continuously differentiable function and a possibly nonsmooth function. We devise such algorithms leveraging recent advances in convex analysis and optimization methods involving Bregman divergences, namely the Bregman–Moreau envelopes and the Bregman proximity operators, and in the Langevin Monte Carlo algorithms reminiscent of mirror descent. The proposed algorithms extend existing Langevin Monte Carlo algorithms in two aspects—the ability to sample nonsmooth distributions with mirror descent-like algorithms, and the use of the more general Bregman–Moreau envelope in place of the Moreau envelope as a smooth approximation of the nonsmooth part of the potential. A particular case of the proposed scheme is reminiscent of the Bregman proximal gradient algorithm. The efficiency of the proposed methodology is illustrated with various sampling tasks at which existing Langevin Monte Carlo methods are known to perform poorly.' volume: 162 URL: https://proceedings.mlr.press/v162/lau22a.html PDF: https://proceedings.mlr.press/v162/lau22a/lau22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lau22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tim Tsz-Kit family: Lau - given: Han family: Liu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12049-12077 id: lau22a issued: date-parts: - 2022 - 6 - 28 firstpage: 12049 lastpage: 12077 published: 2022-06-28 00:00:00 +0000 - title: 'Scalable Deep Reinforcement Learning Algorithms for Mean Field Games' abstract: 'Mean Field Games (MFGs) have been introduced to efficiently approximate games with very large populations of strategic agents. Recently, the question of learning equilibria in MFGs has gained momentum, particularly using model-free reinforcement learning (RL) methods. One limiting factor to further scale up using RL is that existing algorithms to solve MFGs require the mixing of approximated quantities such as strategies or $q$-values. This is far from being trivial in the case of non-linear function approximation that enjoy good generalization properties, e.g. neural networks. We propose two methods to address this shortcoming. The first one learns a mixed strategy from distillation of historical data into a neural network and is applied to the Fictitious Play algorithm. The second one is an online mixing method based on regularization that does not require memorizing historical data or previous estimates. It is used to extend Online Mirror Descent. We demonstrate numerically that these methods efficiently enable the use of Deep RL algorithms to solve various MFGs. In addition, we show that these methods outperform SotA baselines from the literature.' volume: 162 URL: https://proceedings.mlr.press/v162/lauriere22a.html PDF: https://proceedings.mlr.press/v162/lauriere22a/lauriere22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lauriere22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mathieu family: Lauriere - given: Sarah family: Perrin - given: Sertan family: Girgin - given: Paul family: Muller - given: Ayush family: Jain - given: Theophile family: Cabannes - given: Georgios family: Piliouras - given: Julien family: Perolat - given: Romuald family: Elie - given: Olivier family: Pietquin - given: Matthieu family: Geist editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12078-12095 id: lauriere22a issued: date-parts: - 2022 - 6 - 28 firstpage: 12078 lastpage: 12095 published: 2022-06-28 00:00:00 +0000 - title: 'Implicit Bias of Linear Equivariant Networks' abstract: 'Group equivariant convolutional neural networks (G-CNNs) are generalizations of convolutional neural networks (CNNs) which excel in a wide range of technical applications by explicitly encoding symmetries, such as rotations and permutations, in their architectures. Although the success of G-CNNs is driven by their explicit symmetry bias, a recent line of work has proposed that the implicit bias of training algorithms on particular architectures is key to understanding generalization for overparameterized neural nets. In this context, we show that L-layer full-width linear G-CNNs trained via gradient descent for binary classification converge to solutions with low-rank Fourier matrix coefficients, regularized by the 2/L-Schatten matrix norm. Our work strictly generalizes previous analysis on the implicit bias of linear CNNs to linear G-CNNs over all finite groups, including the challenging setting of non-commutative groups (such as permutations), as well as band-limited G-CNNs over infinite groups. We validate our theorems via experiments on a variety of groups, and empirically explore more realistic nonlinear networks, which locally capture similar regularization patterns. Finally, we provide intuitive interpretations of our Fourier space implicit regularization results in real space via uncertainty principles.' volume: 162 URL: https://proceedings.mlr.press/v162/lawrence22a.html PDF: https://proceedings.mlr.press/v162/lawrence22a/lawrence22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lawrence22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hannah family: Lawrence - given: Kristian family: Georgiev - given: Andrew family: Dienes - given: Bobak T. family: Kiani editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12096-12125 id: lawrence22a issued: date-parts: - 2022 - 6 - 28 firstpage: 12096 lastpage: 12125 published: 2022-06-28 00:00:00 +0000 - title: 'Differentially Private Maximal Information Coefficients' abstract: 'The Maximal Information Coefficient (MIC) is a powerful statistic to identify dependencies between variables. However, it may be applied to sensitive data, and publishing it could leak private information. As a solution, we present algorithms to approximate MIC in a way that provides differential privacy. We show that the natural application of the classic Laplace mechanism yields insufficient accuracy. We therefore introduce the MICr statistic, which is a new MIC approximation that is more compatible with differential privacy. We prove MICr is a consistent estimator for MIC, and we provide two differentially private versions of it. We perform experiments on a variety of real and synthetic datasets. The results show that the private MICr statistics significantly outperform direct application of the Laplace mechanism. Moreover, experiments on real-world datasets show accuracy that is usable when the sample size is at least moderately large.' volume: 162 URL: https://proceedings.mlr.press/v162/lazarsfeld22a.html PDF: https://proceedings.mlr.press/v162/lazarsfeld22a/lazarsfeld22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lazarsfeld22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: John family: Lazarsfeld - given: Aaron family: Johnson - given: Emmanuel family: Adeniran editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12126-12163 id: lazarsfeld22a issued: date-parts: - 2022 - 6 - 28 firstpage: 12126 lastpage: 12163 published: 2022-06-28 00:00:00 +0000 - title: 'Entropic Gromov-Wasserstein between Gaussian Distributions' abstract: 'We study the entropic Gromov-Wasserstein and its unbalanced version between (unbalanced) Gaussian distributions with different dimensions. When the metric is the inner product, which we refer to as inner product Gromov-Wasserstein (IGW), we demonstrate that the optimal transportation plans of entropic IGW and its unbalanced variant are (unbalanced) Gaussian distributions. Via an application of von Neumann’s trace inequality, we obtain closed-form expressions for the entropic IGW between these Gaussian distributions. Finally, we consider an entropic inner product Gromov-Wasserstein barycenter of multiple Gaussian distributions. We prove that the barycenter is a Gaussian distribution when the entropic regularization parameter is small. We further derive a closed-form expression for the covariance matrix of the barycenter.' volume: 162 URL: https://proceedings.mlr.press/v162/le22a.html PDF: https://proceedings.mlr.press/v162/le22a/le22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-le22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Khang family: Le - given: Dung Q family: Le - given: Huy family: Nguyen - given: Dat family: Do - given: Tung family: Pham - given: Nhat family: Ho editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12164-12203 id: le22a issued: date-parts: - 2022 - 6 - 28 firstpage: 12164 lastpage: 12203 published: 2022-06-28 00:00:00 +0000 - title: 'Neurocoder: General-Purpose Computation Using Stored Neural Programs' abstract: 'Artificial Neural Networks are functionally equivalent to special-purpose computers. Their inter-neuronal connection weights represent the learnt Neural Program that instructs the networks on how to compute the data. However, without storing Neural Programs, they are restricted to only one, overwriting learnt programs when trained on new data. Here we design Neurocoder, a new class of general-purpose neural networks in which the neural network “codes” itself in a data-responsive way by composing relevant programs from a set of shareable, modular programs stored in external memory. This time, a Neural Program is efficiently treated as data in memory. Integrating Neurocoder into current neural architectures, we demonstrate new capacity to learn modular programs, reuse simple programs to build complex ones, handle pattern shifts and remember old programs as new ones are learnt, and show substantial performance improvement in solving object recognition, playing video games and continual learning tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/le22b.html PDF: https://proceedings.mlr.press/v162/le22b/le22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-le22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hung family: Le - given: Svetha family: Venkatesh editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12204-12221 id: le22b issued: date-parts: - 2022 - 6 - 28 firstpage: 12204 lastpage: 12221 published: 2022-06-28 00:00:00 +0000 - title: 'Convergence of Policy Gradient for Entropy Regularized MDPs with Neural Network Approximation in the Mean-Field Regime' abstract: 'We study the global convergence of policy gradient for infinite-horizon, continuous state and action space, and entropy-regularized Markov decision processes (MDPs). We consider a softmax policy with (one-hidden layer) neural network approximation in a mean-field regime. Additional entropic regularization in the associated mean-field probability measure is added, and the corresponding gradient flow is studied in the 2-Wasserstein metric. We show that the objective function is increasing along the gradient flow. Further, we prove that if the regularization in terms of the mean-field measure is sufficient, the gradient flow converges exponentially fast to the unique stationary solution, which is the unique maximizer of the regularized MDP objective. Lastly, we study the sensitivity of the value function along the gradient flow with respect to regularization parameters and the initial condition. Our results rely on the careful analysis of the non-linear Fokker–Planck–Kolmogorov equation and extend the pioneering work of \cite{mei2020global} and \cite{agarwal2020optimality}, which quantify the global convergence rate of policy gradient for entropy-regularized MDPs in the tabular setting.' volume: 162 URL: https://proceedings.mlr.press/v162/leahy22a.html PDF: https://proceedings.mlr.press/v162/leahy22a/leahy22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-leahy22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: James-Michael family: Leahy - given: Bekzhan family: Kerimkulov - given: David family: Siska - given: Lukasz family: Szpruch editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12222-12252 id: leahy22a issued: date-parts: - 2022 - 6 - 28 firstpage: 12222 lastpage: 12252 published: 2022-06-28 00:00:00 +0000 - title: 'A Random Matrix Analysis of Data Stream Clustering: Coping With Limited Memory Resources' abstract: 'This article introduces a random matrix framework for the analysis of clustering on high-dimensional data streams, a particularly relevant setting for a more sober processing of large amounts of data with limited memory and energy resources. Assuming data $\mathbf{x}_1, \mathbf{x}_2, \ldots$ arrives as a continuous flow and a small number $L$ of them can be kept in the learning pipeline, one has only access to the diagonal elements of the Gram kernel matrix: $\left[ \mathbf{K}_L \right]_{i, j} = \frac{1}{p} \mathbf{x}_i^\top \mathbf{x}_j \mathbf{1}_{\left\lvert i - j \right\rvert < L}$. Under a large-dimensional data regime, we derive the limiting spectral distribution of the banded kernel matrix $\mathbf{K}_L$ and study its isolated eigenvalues and eigenvectors, which behave in an unfamiliar way. We detail how these results can be used to perform efficient online kernel spectral clustering and provide theoretical performance guarantees. Our findings are empirically confirmed on image clustering tasks. Leveraging on optimality results of spectral methods for clustering, this work offers insights on efficient online clustering techniques for high-dimensional data.' volume: 162 URL: https://proceedings.mlr.press/v162/lebeau22a.html PDF: https://proceedings.mlr.press/v162/lebeau22a/lebeau22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lebeau22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hugo family: Lebeau - given: Romain family: Couillet - given: Florent family: Chatelain editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12253-12281 id: lebeau22a issued: date-parts: - 2022 - 6 - 28 firstpage: 12253 lastpage: 12281 published: 2022-06-28 00:00:00 +0000 - title: 'Neural Tangent Kernel Analysis of Deep Narrow Neural Networks' abstract: 'The tremendous recent progress in analyzing the training dynamics of overparameterized neural networks has primarily focused on wide networks and therefore does not sufficiently address the role of depth in deep learning. In this work, we present the first trainability guarantee of infinitely deep but narrow neural networks. We study the infinite-depth limit of a multilayer perceptron (MLP) with a specific initialization and establish a trainability guarantee using the NTK theory. We then extend the analysis to an infinitely deep convolutional neural network (CNN) and perform brief experiments.' volume: 162 URL: https://proceedings.mlr.press/v162/lee22a.html PDF: https://proceedings.mlr.press/v162/lee22a/lee22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lee22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jongmin family: Lee - given: Joo Young family: Choi - given: Ernest K family: Ryu - given: Albert family: No editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12282-12351 id: lee22a issued: date-parts: - 2022 - 6 - 28 firstpage: 12282 lastpage: 12351 published: 2022-06-28 00:00:00 +0000 - title: 'Dataset Condensation with Contrastive Signals' abstract: 'Recent studies have demonstrated that gradient matching-based dataset synthesis, or dataset condensation (DC), methods can achieve state-of-theart performance when applied to data-efficient learning tasks. However, in this study, we prove that the existing DC methods can perform worse than the random selection method when taskirrelevant information forms a significant part of the training dataset. We attribute this to the lack of participation of the contrastive signals between the classes resulting from the class-wise gradient matching strategy. To address this problem, we propose Dataset Condensation with Contrastive signals (DCC) by modifying the loss function to enable the DC methods to effectively capture the differences between classes. In addition, we analyze the new loss function in terms of training dynamics by tracking the kernel velocity. Furthermore, we introduce a bi-level warm-up strategy to stabilize the optimization. Our experimental results indicate that while the existing methods are ineffective for fine-grained image classification tasks, the proposed method can successfully generate informative synthetic datasets for the same tasks. Moreover, we demonstrate that the proposed method outperforms the baselines even on benchmark datasets such as SVHN, CIFAR-10, and CIFAR-100. Finally, we demonstrate the high applicability of the proposed method by applying it to continual learning tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/lee22b.html PDF: https://proceedings.mlr.press/v162/lee22b/lee22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lee22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Saehyung family: Lee - given: Sanghyuk family: Chun - given: Sangwon family: Jung - given: Sangdoo family: Yun - given: Sungroh family: Yoon editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12352-12364 id: lee22b issued: date-parts: - 2022 - 6 - 28 firstpage: 12352 lastpage: 12364 published: 2022-06-28 00:00:00 +0000 - title: 'Confidence Score for Source-Free Unsupervised Domain Adaptation' abstract: 'Source-free unsupervised domain adaptation (SFUDA) aims to obtain high performance in the unlabeled target domain using the pre-trained source model, not the source data. Existing SFUDA methods assign the same importance to all target samples, which is vulnerable to incorrect pseudo-labels. To differentiate between sample importance, in this study, we propose a novel sample-wise confidence score, the Joint Model-Data Structure (JMDS) score for SFUDA. Unlike existing confidence scores that use only one of the source or target domain knowledge, the JMDS score uses both knowledge. We then propose a Confidence score Weighting Adaptation using the JMDS (CoWA-JMDS) framework for SFUDA. CoWA-JMDS consists of the JMDS scores as sample weights and weight Mixup that is our proposed variant of Mixup. Weight Mixup promotes the model make more use of the target domain knowledge. The experimental results show that the JMDS score outperforms the existing confidence scores. Moreover, CoWA-JMDS achieves state-of-the-art performance on various SFUDA scenarios: closed, open, and partial-set scenarios.' volume: 162 URL: https://proceedings.mlr.press/v162/lee22c.html PDF: https://proceedings.mlr.press/v162/lee22c/lee22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lee22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jonghyun family: Lee - given: Dahuin family: Jung - given: Junho family: Yim - given: Sungroh family: Yoon editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12365-12377 id: lee22c issued: date-parts: - 2022 - 6 - 28 firstpage: 12365 lastpage: 12377 published: 2022-06-28 00:00:00 +0000 - title: 'A Statistical Manifold Framework for Point Cloud Data' abstract: 'Many problems in machine learning involve data sets in which each data point is a point cloud in $\mathbb{R}^D$. A growing number of applications require a means of measuring not only distances between point clouds, but also angles, volumes, derivatives, and other more advanced concepts. To formulate and quantify these concepts in a coordinate-invariant way, we develop a Riemannian geometric framework for point cloud data. By interpreting each point in a point cloud as a sample drawn from some given underlying probability density, the space of point cloud data can be given the structure of a statistical manifold – each point on this manifold represents a point cloud – with the Fisher information metric acting as a natural Riemannian metric. Two autoencoder applications of our framework are presented: (i) smoothly deforming one 3D object into another via interpolation between the two corresponding point clouds; (ii) learning an optimal set of latent space coordinates for point cloud data that best preserves angles and distances, and thus produces a more discriminative representation space. Experiments with large-scale standard benchmark point cloud data show greatly improved classification accuracy vis-á-vis existing methods. Code is available at https://github.com/seungyeon-k/SMF-public.' volume: 162 URL: https://proceedings.mlr.press/v162/lee22d.html PDF: https://proceedings.mlr.press/v162/lee22d/lee22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lee22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yonghyeon family: Lee - given: Seungyeon family: Kim - given: Jinwon family: Choi - given: Frank family: Park editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12378-12402 id: lee22d issued: date-parts: - 2022 - 6 - 28 firstpage: 12378 lastpage: 12402 published: 2022-06-28 00:00:00 +0000 - title: 'Low-Complexity Deep Convolutional Neural Networks on Fully Homomorphic Encryption Using Multiplexed Parallel Convolutions' abstract: 'Recently, the standard ResNet-20 network was successfully implemented on the fully homomorphic encryption scheme, residue number system variant Cheon-Kim-Kim-Song (RNS-CKKS) scheme using bootstrapping, but the implementation lacks practicality due to high latency and low security level. To improve the performance, we first minimize total bootstrapping runtime using multiplexed parallel convolution that collects sparse output data for multiple channels compactly. We also propose the imaginary-removing bootstrapping to prevent the deep neural networks from catastrophic divergence during approximate ReLU operations. In addition, we optimize level consumptions and use lighter and tighter parameters. Simulation results show that we have 4.67x lower inference latency and 134x less amortized runtime (runtime per image) for ResNet-20 compared to the state-of-the-art previous work, and we achieve standard 128-bit security. Furthermore, we successfully implement ResNet-110 with high accuracy on the RNS-CKKS scheme for the first time.' volume: 162 URL: https://proceedings.mlr.press/v162/lee22e.html PDF: https://proceedings.mlr.press/v162/lee22e/lee22e.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lee22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Eunsang family: Lee - given: Joon-Woo family: Lee - given: Junghyun family: Lee - given: Young-Sik family: Kim - given: Yongjune family: Kim - given: Jong-Seon family: No - given: Woosuk family: Choi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12403-12422 id: lee22e issued: date-parts: - 2022 - 6 - 28 firstpage: 12403 lastpage: 12422 published: 2022-06-28 00:00:00 +0000 - title: 'Statistical inference with implicit SGD: proximal Robbins-Monro vs. Polyak-Ruppert' abstract: 'The implicit stochastic gradient descent (ISGD), a proximal version of SGD, is gaining interest in the literature due to its stability over (explicit) SGD. In this paper, we conduct an in-depth analysis of the two modes of ISGD for smooth convex functions, namely proximal Robbins-Monro (proxRM) and proximal Poylak-Ruppert (proxPR) procedures, for their use in statistical inference on model parameters. Specifically, we derive non-asymptotic point estimation error bounds of both proxRM and proxPR iterates and their limiting distributions, and propose on-line estimators of their asymptotic covariance matrices that require only a single run of ISGD. The latter estimators are used to construct valid confidence intervals for the model parameters. Our analysis is free of the generalized linear model assumption that has limited the preceding analyses, and employs feasible procedures. Our on-line covariance matrix estimators appear to be the first of this kind in the ISGD literature.' volume: 162 URL: https://proceedings.mlr.press/v162/lee22f.html PDF: https://proceedings.mlr.press/v162/lee22f/lee22f.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lee22f.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yoonhyung family: Lee - given: Sungdong family: Lee - given: Joong-Ho family: Won editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12423-12454 id: lee22f issued: date-parts: - 2022 - 6 - 28 firstpage: 12423 lastpage: 12454 published: 2022-06-28 00:00:00 +0000 - title: 'Maslow’s Hammer in Catastrophic Forgetting: Node Re-Use vs. Node Activation' abstract: 'Continual learning—learning new tasks in sequence while maintaining performance on old tasks—remains particularly challenging for artificial neural networks. Surprisingly, the amount of forgetting does not increase with the dissimilarity between the learned tasks, but appears to be worst in an intermediate similarity regime. In this paper we theoretically analyse both a synthetic teacher-student framework and a real data setup to provide an explanation of this phenomenon that we name Maslow’s Hammer hypothesis. Our analysis reveals the presence of a trade-off between node activation and node re-use that results in worst forgetting in the intermediate regime. Using this understanding we reinterpret popular algorithmic interventions for catastrophic interference in terms of this trade-off, and identify the regimes in which they are most effective.' volume: 162 URL: https://proceedings.mlr.press/v162/lee22g.html PDF: https://proceedings.mlr.press/v162/lee22g/lee22g.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lee22g.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sebastian family: Lee - given: Stefano Sarao family: Mannelli - given: Claudia family: Clopath - given: Sebastian family: Goldt - given: Andrew family: Saxe editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12455-12477 id: lee22g issued: date-parts: - 2022 - 6 - 28 firstpage: 12455 lastpage: 12477 published: 2022-06-28 00:00:00 +0000 - title: 'Query-Efficient and Scalable Black-Box Adversarial Attacks on Discrete Sequential Data via Bayesian Optimization' abstract: 'We focus on the problem of adversarial attacks against models on discrete sequential data in the black-box setting where the attacker aims to craft adversarial examples with limited query access to the victim model. Existing black-box attacks, mostly based on greedy algorithms, find adversarial examples using pre-computed key positions to perturb, which severely limits the search space and might result in suboptimal solutions. To this end, we propose a query-efficient black-box attack using Bayesian optimization, which dynamically computes important positions using an automatic relevance determination (ARD) categorical kernel. We introduce block decomposition and history subsampling techniques to improve the scalability of Bayesian optimization when an input sequence becomes long. Moreover, we develop a post-optimization algorithm that finds adversarial examples with smaller perturbation size. Experiments on natural language and protein classification tasks demonstrate that our method consistently achieves higher attack success rate with significant reduction in query count and modification rate compared to the previous state-of-the-art methods.' volume: 162 URL: https://proceedings.mlr.press/v162/lee22h.html PDF: https://proceedings.mlr.press/v162/lee22h/lee22h.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lee22h.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Deokjae family: Lee - given: Seungyong family: Moon - given: Junhyeok family: Lee - given: Hyun Oh family: Song editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12478-12497 id: lee22h issued: date-parts: - 2022 - 6 - 28 firstpage: 12478 lastpage: 12497 published: 2022-06-28 00:00:00 +0000 - title: 'Least Squares Estimation using Sketched Data with Heteroskedastic Errors' abstract: 'Researchers may perform regressions using a sketch of data of size m instead of the full sample of size n for a variety of reasons. This paper considers the case when the regression errors do not have constant variance and heteroskedasticity robust standard errors would normally be needed for test statistics to provide accurate inference. We show that estimates using data sketched by random projections will behave ’as if’ the errors were homoskedastic. Estimation by random sampling would not have this property. The result arises because the sketched estimates in the case of random projections can be expressed as degenerate U-statistics, and under certain conditions, these statistics are asymptotically normal with homoskedastic variance. We verify that the conditions hold not only in the case of least squares regression when the covariates are exogenous, but also in instrumental variables estimation when the covariates are endogenous. The result implies that inference can be simpler than the full sample case if the sketching scheme is appropriately chosen.' volume: 162 URL: https://proceedings.mlr.press/v162/lee22i.html PDF: https://proceedings.mlr.press/v162/lee22i/lee22i.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lee22i.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sokbae family: Lee - given: Serena family: Ng editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12498-12520 id: lee22i issued: date-parts: - 2022 - 6 - 28 firstpage: 12498 lastpage: 12520 published: 2022-06-28 00:00:00 +0000 - title: 'Why the Rich Get Richer? On the Balancedness of Random Partition Models' abstract: 'Random partition models are widely used in Bayesian methods for various clustering tasks, such as mixture models, topic models, and community detection problems. While the number of clusters induced by random partition models has been studied extensively, another important model property regarding the balancedness of partition has been largely neglected. We formulate a framework to define and theoretically study the balancedness of exchangeable random partition models, by analyzing how a model assigns probabilities to partitions with different levels of balancedness. We demonstrate that the "rich-get-richer" characteristic of many existing popular random partition models is an inevitable consequence of two common assumptions: product-form exchangeability and projectivity. We propose a principled way to compare the balancedness of random partition models, which gives a better understanding of what model works better and what doesn’t for different applications. We also introduce the "rich-get-poorer" random partition models and illustrate their application to entity resolution tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/lee22j.html PDF: https://proceedings.mlr.press/v162/lee22j/lee22j.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lee22j.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Changwoo J family: Lee - given: Huiyan family: Sang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12521-12541 id: lee22j issued: date-parts: - 2022 - 6 - 28 firstpage: 12521 lastpage: 12541 published: 2022-06-28 00:00:00 +0000 - title: 'Model Selection in Batch Policy Optimization' abstract: 'We study the problem of model selection in batch policy optimization: given a fixed, partial-feedback dataset and M model classes, learn a policy with performance that is competitive with the policy derived from the best model class. We formalize the problem in the contextual bandit setting with linear model classes by identifying three sources of error that any model selection algorithm should optimally trade-off in order to be competitive: (1) approximation error, (2) statistical complexity, and (3) coverage. The first two sources are common in model selection for supervised learning, where optimally trading off these two is well-studied. In contrast, the third source is unique to batch policy optimization and is due to dataset shift inherent to the setting. We first show that no batch policy optimization algorithm can achieve a guarantee addressing all three simultaneously, revealing a stark contrast between difficulties in batch policy optimization and the positive results available in supervised learning. Despite this negative result, we show that relaxing any one of the three error sources enables the design of algorithms achieving near-oracle inequalities for the remaining two. We conclude with experiments demonstrating the efficacy of these algorithms.' volume: 162 URL: https://proceedings.mlr.press/v162/lee22k.html PDF: https://proceedings.mlr.press/v162/lee22k/lee22k.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lee22k.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jonathan family: Lee - given: George family: Tucker - given: Ofir family: Nachum - given: Bo family: Dai editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12542-12569 id: lee22k issued: date-parts: - 2022 - 6 - 28 firstpage: 12542 lastpage: 12569 published: 2022-06-28 00:00:00 +0000 - title: 'Supervised Learning with General Risk Functionals' abstract: 'Standard uniform convergence results bound the generalization gap of the expected loss over a hypothesis class. The emergence of risk-sensitive learning requires generalization guarantees for functionals of the loss distribution beyond the expectation. While prior works specialize in uniform convergence of particular functionals, our work provides uniform convergence for a general class of Hölder risk functionals for which the closeness in the Cumulative Distribution Function (CDF) entails closeness in risk. We establish the first uniform convergence results for estimating the CDF of the loss distribution, which yield uniform convergence guarantees that hold simultaneously both over a class of Hölder risk functionals and over a hypothesis class. Thus licensed to perform empirical risk minimization, we develop practical gradient-based methods for minimizing distortion risks (widely studied subset of Hölder risks that subsumes the spectral risks, including the mean, conditional value at risk, cumulative prospect theory risks, and others) and provide convergence guarantees. In experiments, we demonstrate the efficacy of our learning procedure, both in settings where uniform convergence results hold and in high-dimensional settings with deep networks.' volume: 162 URL: https://proceedings.mlr.press/v162/leqi22a.html PDF: https://proceedings.mlr.press/v162/leqi22a/leqi22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-leqi22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Liu family: Leqi - given: Audrey family: Huang - given: Zachary family: Lipton - given: Kamyar family: Azizzadenesheli editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12570-12592 id: leqi22a issued: date-parts: - 2022 - 6 - 28 firstpage: 12570 lastpage: 12592 published: 2022-06-28 00:00:00 +0000 - title: 'Generalized Strategic Classification and the Case of Aligned Incentives' abstract: 'Strategic classification studies learning in settings where self-interested users can strategically modify their features to obtain favorable predictive outcomes. A key working assumption, however, is that “favorable” always means “positive”; this may be appropriate in some applications (e.g., loan approval), but reduces to a fairly narrow view of what user interests can be. In this work we argue for a broader perspective on what accounts for strategic user behavior, and propose and study a flexible model of generalized strategic classification. Our generalized model subsumes most current models but includes other novel settings; among these, we identify and target one intriguing sub-class of problems in which the interests of users and the system are aligned. This setting reveals a surprising fact: that standard max-margin losses are ill-suited for strategic inputs. Returning to our fully generalized model, we propose a novel max-margin framework for strategic learning that is practical and effective, and which we analyze theoretically. We conclude with a set of experiments that empirically demonstrate the utility of our approach.' volume: 162 URL: https://proceedings.mlr.press/v162/levanon22a.html PDF: https://proceedings.mlr.press/v162/levanon22a/levanon22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-levanon22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sagi family: Levanon - given: Nir family: Rosenfeld editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12593-12618 id: levanon22a issued: date-parts: - 2022 - 6 - 28 firstpage: 12593 lastpage: 12618 published: 2022-06-28 00:00:00 +0000 - title: 'A Simple Unified Framework for High Dimensional Bandit Problems' abstract: 'Stochastic high dimensional bandit problems with low dimensional structures are useful in different applications such as online advertising and drug discovery. In this work, we propose a simple unified algorithm for such problems and present a general analysis framework for the regret upper bound of our algorithm. We show that under some mild unified assumptions, our algorithm can be applied to different high-dimensional bandit problems. Our framework utilizes the low dimensional structure to guide the parameter estimation in the problem, therefore our algorithm achieves the comparable regret bounds in the LASSO bandit as a sanity check, as well as novel bounds that depend logarithmically on dimensions in the low-rank matrix bandit, the group sparse matrix bandit, and in a new problem: the multi-agent LASSO bandit.' volume: 162 URL: https://proceedings.mlr.press/v162/li22a.html PDF: https://proceedings.mlr.press/v162/li22a/li22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Wenjie family: Li - given: Adarsh family: Barik - given: Jean family: Honorio editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12619-12655 id: li22a issued: date-parts: - 2022 - 6 - 28 firstpage: 12619 lastpage: 12655 published: 2022-06-28 00:00:00 +0000 - title: 'Robust Training of Neural Networks Using Scale Invariant Architectures' abstract: 'In contrast to SGD, adaptive gradient methods like Adam allow robust training of modern deep networks, especially large language models. However, the use of adaptivity not only comes at the cost of extra memory but also raises the fundamental question: can non-adaptive methods like SGD enjoy similar benefits? In this paper, we provide an affirmative answer to this question by proposing to achieve both robust and memory-efficient training via the following general recipe: (1) modify the architecture and make it scale invariant, (2) train with SGD and weight decay, and optionally (3) clip the global gradient norm proportional to weight norm multiplied by $\sqrt{\frac{2\lambda}{\eta}}$, where $\eta$ is learning rate and $\lambda$ is weight decay. We show that this general approach is robust to rescaling of parameter and loss by proving that its convergence only depends logarithmically on the scale of initialization and loss, whereas the standard SGD might not even converge for many initializations. Following our recipe, we design a scale invariant version of BERT, called SIBERT, which when trained simply by vanilla SGD achieves performance comparable to BERT trained by adaptive methods like Adam on downstream tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/li22b.html PDF: https://proceedings.mlr.press/v162/li22b/li22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhiyuan family: Li - given: Srinadh family: Bhojanapalli - given: Manzil family: Zaheer - given: Sashank family: Reddi - given: Sanjiv family: Kumar editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12656-12684 id: li22b issued: date-parts: - 2022 - 6 - 28 firstpage: 12656 lastpage: 12684 published: 2022-06-28 00:00:00 +0000 - title: 'Spatial-Channel Token Distillation for Vision MLPs' abstract: 'Recently, neural architectures with all Multi-layer Perceptrons (MLPs) have attracted great research interest from the computer vision community. However, the inefficient mixing of spatial-channel information causes MLP-like vision models to demand tremendous pre-training on large-scale datasets. This work solves the problem from a novel knowledge distillation perspective. We propose a novel Spatial-channel Token Distillation (STD) method, which improves the information mixing in the two dimensions by introducing distillation tokens to each of them. A mutual information regularization is further introduced to let distillation tokens focus on their specific dimensions and maximize the performance gain. Extensive experiments on ImageNet for several MLP-like architectures demonstrate that the proposed token distillation mechanism can efficiently improve the accuracy. For example, the proposed STD boosts the top-1 accuracy of Mixer-S16 on ImageNet from 73.8% to 75.7% without any costly pre-training on JFT-300M. When applied to stronger architectures, e.g. CycleMLP-B1 and CycleMLP-B2, STD can still harvest about 1.1% and 0.5% accuracy gains, respectively.' volume: 162 URL: https://proceedings.mlr.press/v162/li22c.html PDF: https://proceedings.mlr.press/v162/li22c/li22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yanxi family: Li - given: Xinghao family: Chen - given: Minjing family: Dong - given: Yehui family: Tang - given: Yunhe family: Wang - given: Chang family: Xu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12685-12695 id: li22c issued: date-parts: - 2022 - 6 - 28 firstpage: 12685 lastpage: 12695 published: 2022-06-28 00:00:00 +0000 - title: 'An Analytical Update Rule for General Policy Optimization' abstract: 'We present an analytical policy update rule that is independent of parametric function approximators. The policy update rule is suitable for optimizing general stochastic policies and has a monotonic improvement guarantee. It is derived from a closed-form solution to trust-region optimization using calculus of variation, following a new theoretical result that tightens existing bounds for policy improvement using trust-region methods. The update rule builds a connection between policy search methods and value function methods. Moreover, off-policy reinforcement learning algorithms can be derived from the update rule since it does not need to compute integration over on-policy states. In addition, the update rule extends immediately to cooperative multi-agent systems when policy updates are performed by one agent at a time.' volume: 162 URL: https://proceedings.mlr.press/v162/li22d.html PDF: https://proceedings.mlr.press/v162/li22d/li22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hepeng family: Li - given: Nicholas family: Clavette - given: Haibo family: He editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12696-12716 id: li22d issued: date-parts: - 2022 - 6 - 28 firstpage: 12696 lastpage: 12716 published: 2022-06-28 00:00:00 +0000 - title: 'On Convergence of Gradient Descent Ascent: A Tight Local Analysis' abstract: 'Gradient Descent Ascent (GDA) methods are the mainstream algorithms for minimax optimization in generative adversarial networks (GANs). Convergence properties of GDA have drawn significant interest in the recent literature. Specifically, for $\min_{x} \max_{y} f(x;y)$ where $f$ is strongly-concave in $y$ and possibly nonconvex in $x$, (Lin et al., 2020) proved the convergence of GDA with a stepsize ratio $\eta_y/\eta_x=\Theta(\kappa^2)$ where $\eta_x$ and $\eta_y$ are the stepsizes for $x$ and $y$ and $\kappa$ is the condition number for $y$. While this stepsize ratio suggests a slow training of the min player, practical GAN algorithms typically adopt similar stepsizes for both variables, indicating a wide gap between theoretical and empirical results. In this paper, we aim to bridge this gap by analyzing the local convergence of general nonconvex-nonconcave minimax problems. We demonstrate that a stepsize ratio of $\Theta(\kappa)$ is necessary and sufficient for local convergence of GDA to a Stackelberg Equilibrium, where $\kappa$ is the local condition number for $y$. We prove a nearly tight convergence rate with a matching lower bound. We further extend the convergence guarantees to stochastic GDA and extra-gradient methods (EG). Finally, we conduct several numerical experiments to support our theoretical findings.' volume: 162 URL: https://proceedings.mlr.press/v162/li22e.html PDF: https://proceedings.mlr.press/v162/li22e/li22e.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Haochuan family: Li - given: Farzan family: Farnia - given: Subhro family: Das - given: Ali family: Jadbabaie editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12717-12740 id: li22e issued: date-parts: - 2022 - 6 - 28 firstpage: 12717 lastpage: 12740 published: 2022-06-28 00:00:00 +0000 - title: 'On the Finite-Time Performance of the Knowledge Gradient Algorithm' abstract: 'The knowledge gradient (KG) algorithm is a popular and effective algorithm for the best arm identification (BAI) problem. Due to the complex calculation of KG, theoretical analysis of this algorithm is difficult, and existing results are mostly about the asymptotic performance of it, e.g., consistency, asymptotic sample allocation, etc. In this research, we present new theoretical results about the finite-time performance of the KG algorithm. Under independent and normally distributed rewards, we derive lower bounds and upper bounds for the probability of error and simple regret of the algorithm. With these bounds, existing asymptotic results become simple corollaries. We also show the performance of the algorithm for the multi-armed bandit (MAB) problem. These developments not only extend the existing analysis of the KG algorithm, but can also be used to analyze other improvement-based algorithms. Last, we use numerical experiments to further demonstrate the finite-time behavior of the KG algorithm.' volume: 162 URL: https://proceedings.mlr.press/v162/li22f.html PDF: https://proceedings.mlr.press/v162/li22f/li22f.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22f.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yanwen family: Li - given: Siyang family: Gao editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12741-12764 id: li22f issued: date-parts: - 2022 - 6 - 28 firstpage: 12741 lastpage: 12764 published: 2022-06-28 00:00:00 +0000 - title: 'Phasic Self-Imitative Reduction for Sparse-Reward Goal-Conditioned Reinforcement Learning' abstract: 'It has been a recent trend to leverage the power of supervised learning (SL) towards more effective reinforcement learning (RL) methods. We propose a novel phasic solution by alternating online RL and offline SL for tackling sparse-reward goal-conditioned problems. In the online phase, we perform RL training and collect rollout data while in the offline phase, we perform SL on those successful trajectories from the dataset. To further improve sample efficiency, we adopt additional techniques in the online phase including task reduction to generate more feasible trajectories and a value-difference-based intrinsic reward to alleviate the sparse-reward issue. We call this overall framework, PhAsic self-Imitative Reduction (PAIR). PAIR is compatible with various online and offline RL methods and substantially outperforms both non-phasic RL and phasic SL baselines on sparse-reward robotic control problems, including a particularly challenging stacking task. PAIR is the first RL method that learns to stack 6 cubes with only 0/1 success rewards from scratch.' volume: 162 URL: https://proceedings.mlr.press/v162/li22g.html PDF: https://proceedings.mlr.press/v162/li22g/li22g.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22g.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yunfei family: Li - given: Tian family: Gao - given: Jiaqi family: Yang - given: Huazhe family: Xu - given: Yi family: Wu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12765-12781 id: li22g issued: date-parts: - 2022 - 6 - 28 firstpage: 12765 lastpage: 12781 published: 2022-06-28 00:00:00 +0000 - title: 'G$^2$CN: Graph Gaussian Convolution Networks with Concentrated Graph Filters' abstract: 'Recently, linear GCNs have shown competitive performance against non-linear ones with less computation cost, and the key lies in their propagation layers. Spectral analysis has been widely adopted in designing and analyzing existing graph propagations. Nevertheless, we notice that existing spectral analysis fails to explain why existing graph propagations with the same global tendency, such as low-pass or high-pass, still yield very different results. Motivated by this situation, we develop a new framework for spectral analysis in this paper called concentration analysis. In particular, we propose three attributes: concentration centre, maximum response, and bandwidth for our analysis. Through a dissection of the limitations of existing graph propagations via the above analysis, we propose a new kind of propagation layer, Graph Gaussian Convolution Networks (G^2CN), in which the three properties are decoupled and the whole structure becomes more flexible and applicable to different kinds of graphs. Extensive experiments show that we can obtain state-of-the-art performance on heterophily and homophily datasets with our proposed G^2CN.' volume: 162 URL: https://proceedings.mlr.press/v162/li22h.html PDF: https://proceedings.mlr.press/v162/li22h/li22h.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22h.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mingjie family: Li - given: Xiaojun family: Guo - given: Yifei family: Wang - given: Yisen family: Wang - given: Zhouchen family: Lin editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12782-12796 id: li22h issued: date-parts: - 2022 - 6 - 28 firstpage: 12782 lastpage: 12796 published: 2022-06-28 00:00:00 +0000 - title: 'Decomposing Temporal High-Order Interactions via Latent ODEs' abstract: 'High-order interactions between multiple objects are common in real-world applications. Although tensor decomposition is a popular framework for high-order interaction analysis and prediction, most methods cannot well exploit the valuable timestamp information in data. The existent methods either discard the timestamps or convert them into discrete steps or use over-simplistic decomposition models. As a result, these methods might not be capable enough of capturing complex, fine-grained temporal dynamics or making accurate predictions for long-term interaction results. To overcome these limitations, we propose a novel Temporal High-order Interaction decompoSition model based on Ordinary Differential Equations (THIS-ODE). We model the time-varying interaction result with a latent ODE. To capture the complex temporal dynamics, we use a neural network (NN) to learn the time derivative of the ODE state. We use the representation of the interaction objects to model the initial value of the ODE and to constitute a part of the NN input to compute the state. In this way, the temporal relationships of the participant objects can be estimated and encoded into their representations. For tractable and scalable inference, we use forward sensitivity analysis to efficiently compute the gradient of ODE state, based on which we use integral transform to develop a stochastic mini-batch learning algorithm. We demonstrate the advantage of our approach in simulation and four real-world applications.' volume: 162 URL: https://proceedings.mlr.press/v162/li22i.html PDF: https://proceedings.mlr.press/v162/li22i/li22i.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22i.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Shibo family: Li - given: Robert family: Kirby - given: Shandian family: Zhe editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12797-12812 id: li22i issued: date-parts: - 2022 - 6 - 28 firstpage: 12797 lastpage: 12812 published: 2022-06-28 00:00:00 +0000 - title: 'Neural Inverse Transform Sampler' abstract: 'Any explicit functional representation $f$ of a density is hampered by two main obstacles when we wish to use it as a generative model: designing $f$ so that sampling is fast, and estimating $Z = \int f$ so that $Z^{-1}f$ integrates to 1. This becomes increasingly complicated as $f$ itself becomes complicated. In this paper, we show that when modeling one-dimensional conditional densities with a neural network, $Z$ can be exactly and efficiently computed by letting the network represent the cumulative distribution function of a target density, and applying a generalized fundamental theorem of calculus. We also derive a fast algorithm for sampling from the resulting representation by the inverse transform method. By extending these principles to higher dimensions, we introduce the \textbf{Neural Inverse Transform Sampler (NITS)}, a novel deep learning framework for modeling and sampling from general, multidimensional, compactly-supported probability densities. NITS is a highly expressive density estimator that boasts end-to-end differentiability, fast sampling, and exact and cheap likelihood evaluation. We demonstrate the applicability of NITS by applying it to realistic, high-dimensional density estimation tasks: likelihood-based generative modeling on the CIFAR-10 dataset, and density estimation on the UCI suite of benchmark datasets, where NITS produces compelling results rivaling or surpassing the state of the art.' volume: 162 URL: https://proceedings.mlr.press/v162/li22j.html PDF: https://proceedings.mlr.press/v162/li22j/li22j.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22j.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Henry family: Li - given: Yuval family: Kluger editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12813-12825 id: li22j issued: date-parts: - 2022 - 6 - 28 firstpage: 12813 lastpage: 12825 published: 2022-06-28 00:00:00 +0000 - title: 'PLATINUM: Semi-Supervised Model Agnostic Meta-Learning using Submodular Mutual Information' abstract: 'Few-shot classification (FSC) requires training models using a few (typically one to five) data points per class. Meta-learning has proven to be able to learn a parametrized model for FSC by training on various other classification tasks. In this work, we propose PLATINUM (semi-suPervised modeL Agnostic meTa learnIng usiNg sUbmodular Mutual information ), a novel semi-supervised model agnostic meta learning framework that uses the submodular mutual in- formation (SMI) functions to boost the perfor- mance of FSC. PLATINUM leverages unlabeled data in the inner and outer loop using SMI func- tions during meta-training and obtains richer meta- learned parameterizations. We study the per- formance of PLATINUM in two scenarios - 1) where the unlabeled data points belong to the same set of classes as the labeled set of a cer- tain episode, and 2) where there exist out-of- distribution classes that do not belong to the la- beled set. We evaluate our method on various settings on the miniImageNet, tieredImageNet and CIFAR-FS datasets. Our experiments show that PLATINUM outperforms MAML and semi- supervised approaches like pseduo-labeling for semi-supervised FSC, especially for small ratio of labeled to unlabeled samples.' volume: 162 URL: https://proceedings.mlr.press/v162/li22k.html PDF: https://proceedings.mlr.press/v162/li22k/li22k.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22k.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Changbin family: Li - given: Suraj family: Kothawade - given: Feng family: Chen - given: Rishabh family: Iyer editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12826-12842 id: li22k issued: date-parts: - 2022 - 6 - 28 firstpage: 12826 lastpage: 12842 published: 2022-06-28 00:00:00 +0000 - title: 'Deconfounded Value Decomposition for Multi-Agent Reinforcement Learning' abstract: 'Value decomposition (VD) methods have been widely used in cooperative multi-agent reinforcement learning (MARL), where credit assignment plays an important role in guiding the agents’ decentralized execution. In this paper, we investigate VD from a novel perspective of causal inference. We first show that the environment in existing VD methods is an unobserved confounder as the common cause factor of the global state and the joint value function, which leads to the confounding bias on learning credit assignment. We then present our approach, deconfounded value decomposition (DVD), which cuts off the backdoor confounding path from the global state to the joint value function. The cut is implemented by introducing the trajectory graph, which depends only on the local trajectories, as a proxy confounder. DVD is general enough to be applied to various VD methods, and extensive experiments show that DVD can consistently achieve significant performance gains over different state-of-the-art VD methods on StarCraft II and MACO benchmarks.' volume: 162 URL: https://proceedings.mlr.press/v162/li22l.html PDF: https://proceedings.mlr.press/v162/li22l/li22l.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22l.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jiahui family: Li - given: Kun family: Kuang - given: Baoxiang family: Wang - given: Furui family: Liu - given: Long family: Chen - given: Changjie family: Fan - given: Fei family: Wu - given: Jun family: Xiao editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12843-12856 id: li22l issued: date-parts: - 2022 - 6 - 28 firstpage: 12843 lastpage: 12856 published: 2022-06-28 00:00:00 +0000 - title: 'C-MinHash: Improving Minwise Hashing with Circulant Permutation' abstract: 'Minwise hashing (MinHash) is an important and practical algorithm for generating random hashes to approximate the Jaccard (resemblance) similarity in massive binary (0/1) data. The basic theory of MinHash requires applying hundreds or even thousands of independent random permutations to each data vector in the dataset, in order to obtain reliable results for (e.g.,) building large-scale learning models or approximate near neighbor search. In this paper, we propose Circulant MinHash (C-MinHash) and provide the surprising theoretical results that using only two independent random permutations in a circulant manner leads to uniformly smaller Jaccard estimation variance than that of the classical MinHash with K independent permutations. Experiments are conducted to show the effectiveness of the proposed method. We also propose a more convenient C-MinHash variant which reduces two permutations to just one, with extensive numerical results to validate that it achieves essentially the same estimation accuracy as using two permutations.' volume: 162 URL: https://proceedings.mlr.press/v162/li22m.html PDF: https://proceedings.mlr.press/v162/li22m/li22m.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22m.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xiaoyun family: Li - given: Ping family: Li editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12857-12887 id: li22m issued: date-parts: - 2022 - 6 - 28 firstpage: 12857 lastpage: 12887 published: 2022-06-28 00:00:00 +0000 - title: 'BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation' abstract: 'Vision-Language Pre-training (VLP) has advanced the performance for many vision-language tasks. However, most existing pre-trained models only excel in either understanding-based tasks or generation-based tasks. Furthermore, performance improvement has been largely achieved by scaling up the dataset with noisy image-text pairs collected from the web, which is a suboptimal source of supervision. In this paper, we propose BLIP, a new VLP framework which transfers flexibly to both vision-language understanding and generation tasks. BLIP effectively utilizes the noisy web data by bootstrapping the captions, where a captioner generates synthetic captions and a filter removes the noisy ones. We achieve state-of-the-art results on a wide range of vision-language tasks, such as image-text retrieval (+2.7% in average recall@1), image captioning (+2.8% in CIDEr), and VQA (+1.6% in VQA score). BLIP also demonstrates strong generalization ability when directly transferred to video-language tasks in a zero-shot manner. Code and models are available at https://github.com/salesforce/BLIP.' volume: 162 URL: https://proceedings.mlr.press/v162/li22n.html PDF: https://proceedings.mlr.press/v162/li22n/li22n.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22n.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Junnan family: Li - given: Dongxu family: Li - given: Caiming family: Xiong - given: Steven family: Hoi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12888-12900 id: li22n issued: date-parts: - 2022 - 6 - 28 firstpage: 12888 lastpage: 12900 published: 2022-06-28 00:00:00 +0000 - title: 'Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in the $O(ε^-7/4)$ Complexity' abstract: 'This paper studies the accelerated gradient descent for general nonconvex problems under the gradient Lipschitz and Hessian Lipschitz assumptions. We establish that a simple restarted accelerated gradient descent (AGD) finds an $\epsilon$-approximate first-order stationary point in $O(\epsilon^{-7/4})$ gradient computations with simple proofs. Our complexity does not hide any polylogarithmic factors, and thus it improves over the state-of-the-art one by the $O(\log\frac{1}{\epsilon})$ factor. Our simple algorithm only consists of Nesterov’s classical AGD and a restart mechanism, and it does not need the negative curvature exploitation or the optimization of regularized surrogate functions. Technically, our simple proof does not invoke the analysis for the strongly convex AGD, which is crucial to remove the $O(\log\frac{1}{\epsilon})$ factor.' volume: 162 URL: https://proceedings.mlr.press/v162/li22o.html PDF: https://proceedings.mlr.press/v162/li22o/li22o.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22o.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Huan family: Li - given: Zhouchen family: Lin editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12901-12916 id: li22o issued: date-parts: - 2022 - 6 - 28 firstpage: 12901 lastpage: 12916 published: 2022-06-28 00:00:00 +0000 - title: 'Achieving Fairness at No Utility Cost via Data Reweighing with Influence' abstract: 'With the fast development of algorithmic governance, fairness has become a compulsory property for machine learning models to suppress unintentional discrimination. In this paper, we focus on the pre-processing aspect for achieving fairness, and propose a data reweighing approach that only adjusts the weight for samples in the training phase. Different from most previous reweighing methods which usually assign a uniform weight for each (sub)group, we granularly model the influence of each training sample with regard to fairness-related quantity and predictive utility, and compute individual weights based on influence under the constraints from both fairness and utility. Experimental results reveal that previous methods achieve fairness at a non-negligible cost of utility, while as a significant advantage, our approach can empirically release the tradeoff and obtain cost-free fairness for equal opportunity. We demonstrate the cost-free fairness through vanilla classifiers and standard training processes, compared to baseline methods on multiple real-world tabular datasets. Code available at https://github.com/brandeis-machine-learning/influence-fairness.' volume: 162 URL: https://proceedings.mlr.press/v162/li22p.html PDF: https://proceedings.mlr.press/v162/li22p/li22p.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22p.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Peizhao family: Li - given: Hongfu family: Liu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12917-12930 id: li22p issued: date-parts: - 2022 - 6 - 28 firstpage: 12917 lastpage: 12930 published: 2022-06-28 00:00:00 +0000 - title: 'High Probability Guarantees for Nonconvex Stochastic Gradient Descent with Heavy Tails' abstract: 'Stochastic gradient descent (SGD) is the workhorse in modern machine learning and data-driven optimization. Despite its popularity, existing theoretical guarantees for SGD are mainly derived in expectation and for convex learning problems. High probability guarantees of nonconvex SGD are scarce, and typically rely on “light-tail” noise assumptions and study the optimization and generalization performance separately. In this paper, we develop high probability bounds for nonconvex SGD with a joint perspective of optimization and generalization performance. Instead of the light tail assumption, we consider the gradient noise following a heavy-tailed sub-Weibull distribution, a novel class generalizing the sub-Gaussian and sub-Exponential families to potentially heavier-tailed distributions. Under these complicated settings, we first present high probability bounds with best-known rates in general nonconvex learning, then move to nonconvex learning with a gradient dominance curvature condition, for which we improve the learning guarantees to fast rates. We further obtain sharper learning guarantees by considering a mild Bernstein-type noise condition. Our analysis also reveals the effect of trade-offs between the optimization and generalization performance under different conditions. In the last, we show that gradient clipping can be employed to remove the bounded gradient-type assumptions. Additionally, in this case, the stepsize of SGD is completely oblivious to the knowledge of smoothness.' volume: 162 URL: https://proceedings.mlr.press/v162/li22q.html PDF: https://proceedings.mlr.press/v162/li22q/li22q.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22q.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Shaojie family: Li - given: Yong family: Liu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12931-12963 id: li22q issued: date-parts: - 2022 - 6 - 28 firstpage: 12931 lastpage: 12963 published: 2022-06-28 00:00:00 +0000 - title: 'MetAug: Contrastive Learning via Meta Feature Augmentation' abstract: 'What matters for contrastive learning? We argue that contrastive learning heavily relies on informative features, or “hard” (positive or negative) features. Early works include more informative features by applying complex data augmentations and large batch size or memory bank, and recent works design elaborate sampling approaches to explore informative features. The key challenge toward exploring such features is that the source multi-view data is generated by applying random data augmentations, making it infeasible to always add useful information in the augmented data. Consequently, the informativeness of features learned from such augmented data is limited. In response, we propose to directly augment the features in latent space, thereby learning discriminative representations without a large amount of input data. We perform a meta learning technique to build the augmentation generator that updates its network parameters by considering the performance of the encoder. However, insufficient input data may lead the encoder to learn collapsed features and therefore malfunction the augmentation generator. A new margin-injected regularization is further added in the objective function to avoid the encoder learning a degenerate mapping. To contrast all features in one gradient back-propagation step, we adopt the proposed optimization-driven unified contrastive loss instead of the conventional contrastive loss. Empirically, our method achieves state-of-the-art results on several benchmark datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/li22r.html PDF: https://proceedings.mlr.press/v162/li22r/li22r.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22r.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jiangmeng family: Li - given: Wenwen family: Qiang - given: Changwen family: Zheng - given: Bing family: Su - given: Hui family: Xiong editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12964-12978 id: li22r issued: date-parts: - 2022 - 6 - 28 firstpage: 12964 lastpage: 12978 published: 2022-06-28 00:00:00 +0000 - title: 'PMIC: Improving Multi-Agent Reinforcement Learning with Progressive Mutual Information Collaboration' abstract: 'Learning to collaborate is critical in Multi-Agent Reinforcement Learning (MARL). Previous works promote collaboration by maximizing the correlation of agents’ behaviors, which is typically characterized by Mutual Information (MI) in different forms. However, we reveal sub-optimal collaborative behaviors also emerge with strong correlations, and simply maximizing the MI can, surprisingly, hinder the learning towards better collaboration. To address this issue, we propose a novel MARL framework, called Progressive Mutual Information Collaboration (PMIC), for more effective MI-driven collaboration. PMIC uses a new collaboration criterion measured by the MI between global states and joint actions. Based on this criterion, the key idea of PMIC is maximizing the MI associated with superior collaborative behaviors and minimizing the MI associated with inferior ones. The two MI objectives play complementary roles by facilitating better collaborations while avoiding falling into sub-optimal ones. Experiments on a wide range of MARL benchmarks show the superior performance of PMIC compared with other algorithms.' volume: 162 URL: https://proceedings.mlr.press/v162/li22s.html PDF: https://proceedings.mlr.press/v162/li22s/li22s.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22s.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Pengyi family: Li - given: Hongyao family: Tang - given: Tianpei family: Yang - given: Xiaotian family: Hao - given: Tong family: Sang - given: Yan family: Zheng - given: Jianye family: Hao - given: Matthew E. family: Taylor - given: Wenyuan family: Tao - given: Zhen family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12979-12997 id: li22s issued: date-parts: - 2022 - 6 - 28 firstpage: 12979 lastpage: 12997 published: 2022-06-28 00:00:00 +0000 - title: 'CerDEQ: Certifiable Deep Equilibrium Model' abstract: 'Recently, certifiable robust training methods via bound propagation have been proposed for training neural networks with certifiable robustness guarantees. However, no neural architectures with regular convolution and linear layers perform better in the certifiable training than the plain CNNs, since the output bounds for the deep explicit models increase quickly as their depth increases. And such a phenomenon significantly hinders certifiable training. Meanwhile, the Deep Equilibrium Model (DEQ) is more representative and robust due to their equivalent infinite depth and controllable global Lipschitz. But no work has been proposed to explore whether DEQ can show advantages in certified training. In this work, we aim to tackle the problem of DEQ’s certified training. To obtain the output bound based on the bound propagation scheme in the implicit model, we first involve the adjoint DEQ for bound approximation. Furthermore, we also use the weight orthogonalization method and other tricks specified for DEQ to stabilize the certifiable training. With our approach, we can obtain the certifiable DEQ called CerDEQ. Our CerDEQ can achieve state-of-the-art performance compared with models using regular convolution and linear layers on $\ell_\infty$ tasks with $\epsilon=8/255$: $64.72%$ certified error for CIFAR-$10$ and $94.45%$ certified error for Tiny ImageNet.' volume: 162 URL: https://proceedings.mlr.press/v162/li22t.html PDF: https://proceedings.mlr.press/v162/li22t/li22t.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22t.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mingjie family: Li - given: Yisen family: Wang - given: Zhouchen family: Lin editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 12998-13013 id: li22t issued: date-parts: - 2022 - 6 - 28 firstpage: 12998 lastpage: 13013 published: 2022-06-28 00:00:00 +0000 - title: 'Generalization Guarantee of Training Graph Convolutional Networks with Graph Topology Sampling' abstract: 'Graph convolutional networks (GCNs) have recently achieved great empirical success in learning graph-structured data. To address its scalability issue due to the recursive embedding of neighboring features, graph topology sampling has been proposed to reduce the memory and computational cost of training GCNs, and it has achieved comparable test performance to those without topology sampling in many empirical studies. To the best of our knowledge, this paper provides the first theoretical justification of graph topology sampling in training (up to) three-layer GCNs for semi-supervised node classification. We formally characterize some sufficient conditions on graph topology sampling such that GCN training leads to diminishing generalization error. Moreover, our method tackles the non-convex interaction of weights across layers, which is under-explored in the existing theoretical analyses of GCNs. This paper characterizes the impact of graph structures and topology sampling on the generalization performance and sample complexity explicitly, and the theoretical findings are also justified through numerical experiments.' volume: 162 URL: https://proceedings.mlr.press/v162/li22u.html PDF: https://proceedings.mlr.press/v162/li22u/li22u.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22u.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hongkang family: Li - given: Meng family: Wang - given: Sijia family: Liu - given: Pin-Yu family: Chen - given: Jinjun family: Xiong editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13014-13051 id: li22u issued: date-parts: - 2022 - 6 - 28 firstpage: 13014 lastpage: 13051 published: 2022-06-28 00:00:00 +0000 - title: 'Let Invariant Rationale Discovery Inspire Graph Contrastive Learning' abstract: 'Leading graph contrastive learning (GCL) methods perform graph augmentations in two fashions: (1) randomly corrupting the anchor graph, which could cause the loss of semantic information, or (2) using domain knowledge to maintain salient features, which undermines the generalization to other domains. Taking an invariance look at GCL, we argue that a high-performing augmentation should preserve the salient semantics of anchor graphs regarding instance-discrimination. To this end, we relate GCL with invariant rationale discovery, and propose a new framework, Rationale-aware Graph Contrastive Learning (RGCL). Specifically, without supervision signals, RGCL uses a rationale generator to reveal salient features about graph instance-discrimination as the rationale, and then creates rationale-aware views for contrastive learning. This rationale-aware pre-training scheme endows the backbone model with the powerful representation ability, further facilitating the fine-tuning on downstream tasks. On MNIST-Superpixel and MUTAG datasets, visual inspections on the discovered rationales showcase that the rationale generator successfully captures the salient features (\ie distinguishing semantic nodes in graphs). On biochemical molecule and social network benchmark datasets, the state-of-the-art performance of RGCL demonstrates the effectiveness of rationale-aware views for contrastive learning. Our codes are available at https://github.com/lsh0520/RGCL.' volume: 162 URL: https://proceedings.mlr.press/v162/li22v.html PDF: https://proceedings.mlr.press/v162/li22v/li22v.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22v.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sihang family: Li - given: Xiang family: Wang - given: An family: Zhang - given: Yingxin family: Wu - given: Xiangnan family: He - given: Tat-Seng family: Chua editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13052-13065 id: li22v issued: date-parts: - 2022 - 6 - 28 firstpage: 13052 lastpage: 13065 published: 2022-06-28 00:00:00 +0000 - title: 'Difference Advantage Estimation for Multi-Agent Policy Gradients' abstract: 'Multi-agent policy gradient methods in centralized training with decentralized execution recently witnessed many progresses. During centralized training, multi-agent credit assignment is crucial, which can substantially promote learning performance. However, explicit multi-agent credit assignment in multi-agent policy gradient methods still receives less attention. In this paper, we investigate multi-agent credit assignment induced by reward shaping and provide a theoretical understanding in terms of its credit assignment and policy bias. Based on this, we propose an exponentially weighted advantage estimator, which is analogous to GAE, to enable multi-agent credit assignment while allowing the tradeoff with policy bias. Empirical results show that our approach can successfully perform effective multi-agent credit assignment, and thus substantially outperforms other advantage estimators.' volume: 162 URL: https://proceedings.mlr.press/v162/li22w.html PDF: https://proceedings.mlr.press/v162/li22w/li22w.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22w.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yueheng family: Li - given: Guangming family: Xie - given: Zongqing family: Lu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13066-13085 id: li22w issued: date-parts: - 2022 - 6 - 28 firstpage: 13066 lastpage: 13085 published: 2022-06-28 00:00:00 +0000 - title: 'Private Adaptive Optimization with Side information' abstract: 'Adaptive optimization methods have become the default solvers for many machine learning tasks. Unfortunately, the benefits of adaptivity may degrade when training with differential privacy, as the noise added to ensure privacy reduces the effectiveness of the adaptive preconditioner. To this end, we propose AdaDPS, a general framework that uses non-sensitive side information to precondition the gradients, allowing the effective use of adaptive methods in private settings. We formally show AdaDPS reduces the amount of noise needed to achieve similar privacy guarantees, thereby improving optimization performance. Empirically, we leverage simple and readily available side information to explore the performance of AdaDPS in practice, comparing to strong baselines in both centralized and federated settings. Our results show that AdaDPS improves accuracy by 7.7% (absolute) on average—yielding state-of-the-art privacy-utility trade-offs on large-scale text and image benchmarks.' volume: 162 URL: https://proceedings.mlr.press/v162/li22x.html PDF: https://proceedings.mlr.press/v162/li22x/li22x.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22x.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tian family: Li - given: Manzil family: Zaheer - given: Sashank family: Reddi - given: Virginia family: Smith editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13086-13105 id: li22x issued: date-parts: - 2022 - 6 - 28 firstpage: 13086 lastpage: 13105 published: 2022-06-28 00:00:00 +0000 - title: 'Permutation Search of Tensor Network Structures via Local Sampling' abstract: 'Recent works put much effort into tensor network structure search (TN-SS), aiming to select suitable tensor network (TN) structures, involving the TN-ranks, formats, and so on, for the decomposition or learning tasks. In this paper, we consider a practical variant of TN-SS, dubbed TN permutation search (TN-PS), in which we search for good mappings from tensor modes onto TN vertices (core tensors) for compact TN representations. We conduct a theoretical investigation of TN-PS and propose a practically-efficient algorithm to resolve the problem. Theoretically, we prove the counting and metric properties of search spaces of TN-PS, analyzing for the first time the impact of TN structures on these unique properties. Numerically, we propose a novel meta-heuristic algorithm, in which the searching is done by randomly sampling in a neighborhood established in our theory, and then recurrently updating the neighborhood until convergence. Numerical results demonstrate that the new algorithm can reduce the required model size of TNs in extensive benchmarks, implying the improvement in the expressive power of TNs. Furthermore, the computational cost for the new algorithm is significantly less than that in (Li and Sun, 2020).' volume: 162 URL: https://proceedings.mlr.press/v162/li22y.html PDF: https://proceedings.mlr.press/v162/li22y/li22y.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22y.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Chao family: Li - given: Junhua family: Zeng - given: Zerui family: Tao - given: Qibin family: Zhao editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13106-13124 id: li22y issued: date-parts: - 2022 - 6 - 28 firstpage: 13106 lastpage: 13124 published: 2022-06-28 00:00:00 +0000 - title: 'Hessian-Free High-Resolution Nesterov Acceleration For Sampling' abstract: 'Nesterov’s Accelerated Gradient (NAG) for optimization has better performance than its continuous time limit (noiseless kinetic Langevin) when a finite step-size is employed (Shi et al., 2021). This work explores the sampling counterpart of this phenonemon and proposes a diffusion process, whose discretizations can yield accelerated gradient-based MCMC methods. More precisely, we reformulate the optimizer of NAG for strongly convex functions (NAG-SC) as a Hessian-Free High-Resolution ODE, change its high-resolution coefficient to a hyperparameter, inject appropriate noise, and discretize the resulting diffusion process. The acceleration effect of the new hyperparameter is quantified and it is not an artificial one created by time-rescaling. Instead, acceleration beyond underdamped Langevin in $W_2$ distance is quantitatively established for log-strongly-concave-and-smooth targets, at both the continuous dynamics level and the discrete algorithm level. Empirical experiments in both log-strongly-concave and multi-modal cases also numerically demonstrate this acceleration.' volume: 162 URL: https://proceedings.mlr.press/v162/li22z.html PDF: https://proceedings.mlr.press/v162/li22z/li22z.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22z.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ruilin family: Li - given: Hongyuan family: Zha - given: Molei family: Tao editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13125-13162 id: li22z issued: date-parts: - 2022 - 6 - 28 firstpage: 13125 lastpage: 13162 published: 2022-06-28 00:00:00 +0000 - title: 'Double Sampling Randomized Smoothing' abstract: 'Neural networks (NNs) are known to be vulnerable against adversarial perturbations, and thus there is a line of work aiming to provide robustness certification for NNs, such as randomized smoothing, which samples smoothing noises from a certain distribution to certify the robustness for a smoothed classifier. However, as previous work shows, the certified robust radius in randomized smoothing suffers from scaling to large datasets ("curse of dimensionality"). To overcome this hurdle, we propose a Double Sampling Randomized Smoothing (DSRS) framework, which exploits the sampled probability from an additional smoothing distribution to tighten the robustness certification of the previous smoothed classifier. Theoretically, under mild assumptions, we prove that DSRS can certify $\Theta(\sqrt d)$ robust radius under $\ell_2$ norm where $d$ is the input dimension, which implies that DSRS may be able to break the curse of dimensionality of randomized smoothing. We instantiate DSRS for a generalized family of Gaussian smoothing and propose an efficient and sound computing method based on customized dual optimization considering sampling error. Extensive experiments on MNIST, CIFAR-10, and ImageNet verify our theory and show that DSRS certifies larger robust radii than existing baselines consistently under different settings. Code is available at https://github.com/llylly/DSRS.' volume: 162 URL: https://proceedings.mlr.press/v162/li22aa.html PDF: https://proceedings.mlr.press/v162/li22aa/li22aa.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22aa.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Linyi family: Li - given: Jiawei family: Zhang - given: Tao family: Xie - given: Bo family: Li editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13163-13208 id: li22aa issued: date-parts: - 2022 - 6 - 28 firstpage: 13163 lastpage: 13208 published: 2022-06-28 00:00:00 +0000 - title: 'HousE: Knowledge Graph Embedding with Householder Parameterization' abstract: 'The effectiveness of knowledge graph embedding (KGE) largely depends on the ability to model intrinsic relation patterns and mapping properties. However, existing approaches can only capture some of them with insufficient modeling capacity. In this work, we propose a more powerful KGE framework named HousE, which involves a novel parameterization based on two kinds of Householder transformations: (1) Householder rotations to achieve superior capacity of modeling relation patterns; (2) Householder projections to handle sophisticated relation mapping properties. Theoretically, HousE is capable of modeling crucial relation patterns and mapping properties simultaneously. Besides, HousE is a generalization of existing rotation-based models while extending the rotations to high-dimensional spaces. Empirically, HousE achieves new state-of-the-art performance on five benchmark datasets. Our code is available at https://github.com/anrep/HousE.' volume: 162 URL: https://proceedings.mlr.press/v162/li22ab.html PDF: https://proceedings.mlr.press/v162/li22ab/li22ab.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22ab.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Rui family: Li - given: Jianan family: Zhao - given: Chaozhuo family: Li - given: Di family: He - given: Yiqi family: Wang - given: Yuming family: Liu - given: Hao family: Sun - given: Senzhang family: Wang - given: Weiwei family: Deng - given: Yanming family: Shen - given: Xing family: Xie - given: Qi family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13209-13224 id: li22ab issued: date-parts: - 2022 - 6 - 28 firstpage: 13209 lastpage: 13224 published: 2022-06-28 00:00:00 +0000 - title: 'Learning Multiscale Transformer Models for Sequence Generation' abstract: 'Multiscale feature hierarchies have been witnessed the success in the computer vision area. This further motivates researchers to design multiscale Transformer for natural language processing, mostly based on the self-attention mechanism. For example, restricting the receptive field across heads or extracting local fine-grained features via convolutions. However, most of existing works directly modeled local features but ignored the word-boundary information. This results in redundant and ambiguous attention distributions, which lacks of interpretability. In this work, we define those scales in different linguistic units, including sub-words, words and phrases. We built a multiscale Transformer model by establishing relationships among scales based on word-boundary information and phrase-level prior knowledge. The proposed \textbf{U}niversal \textbf{M}ulti\textbf{S}cale \textbf{T}ransformer, namely \textsc{Umst}, was evaluated on two sequence generation tasks. Notably, it yielded consistent performance gains over the strong baseline on several test sets without sacrificing the efficiency.' volume: 162 URL: https://proceedings.mlr.press/v162/li22ac.html PDF: https://proceedings.mlr.press/v162/li22ac/li22ac.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22ac.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Bei family: Li - given: Tong family: Zheng - given: Yi family: Jing - given: Chengbo family: Jiao - given: Tong family: Xiao - given: Jingbo family: Zhu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13225-13241 id: li22ac issued: date-parts: - 2022 - 6 - 28 firstpage: 13225 lastpage: 13241 published: 2022-06-28 00:00:00 +0000 - title: 'Finding Global Homophily in Graph Neural Networks When Meeting Heterophily' abstract: 'We investigate graph neural networks on graphs with heterophily. Some existing methods amplify a node’s neighborhood with multi-hop neighbors to include more nodes with homophily. However, it is a significant challenge to set personalized neighborhood sizes for different nodes. Further, for other homophilous nodes excluded in the neighborhood, they are ignored for information aggregation. To address these problems, we propose two models GloGNN and GloGNN++, which generate a node’s embedding by aggregating information from global nodes in the graph. In each layer, both models learn a coefficient matrix to capture the correlations between nodes, based on which neighborhood aggregation is performed. The coefficient matrix allows signed values and is derived from an optimization problem that has a closed-form solution. We further accelerate neighborhood aggregation and derive a linear time complexity. We theoretically explain the models’ effectiveness by proving that both the coefficient matrix and the generated node embedding matrix have the desired grouping effect. We conduct extensive experiments to compare our models against 11 other competitors on 15 benchmark datasets in a wide range of domains, scales and graph heterophilies. Experimental results show that our methods achieve superior performance and are also very efficient.' volume: 162 URL: https://proceedings.mlr.press/v162/li22ad.html PDF: https://proceedings.mlr.press/v162/li22ad/li22ad.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-li22ad.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xiang family: Li - given: Renyu family: Zhu - given: Yao family: Cheng - given: Caihua family: Shan - given: Siqiang family: Luo - given: Dongsheng family: Li - given: Weining family: Qian editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13242-13256 id: li22ad issued: date-parts: - 2022 - 6 - 28 firstpage: 13242 lastpage: 13256 published: 2022-06-28 00:00:00 +0000 - title: 'Fat–Tailed Variational Inference with Anisotropic Tail Adaptive Flows' abstract: 'While fat-tailed densities commonly arise as posterior and marginal distributions in robust models and scale mixtures, they present a problematic scenario when Gaussian-based variational inference fails to accurately capture tail decay. We first improve previous theory on tails of Lipschitz flows by quantifying how they affect the rate of tail decay and expanding the theory to non-Lipschitz polynomial flows. Next, we develop an alternative theory for multivariate tail parameters which is sensitive to tail-anisotropy. In doing so, we unveil a fundamental problem which plagues many existing flow-based methods: they can only model tail-isotropic distributions (i.e., distributions having the same tail parameter in every direction). To mitigate this and enable modeling of tail-anisotropic targets, we propose anisotropic tail-adaptive flows (ATAF). Experimental results confirm ATAF on both synthetic and real-world targets is competitive with prior work while also exhibiting appropriate tail-anisotropy.' volume: 162 URL: https://proceedings.mlr.press/v162/liang22a.html PDF: https://proceedings.mlr.press/v162/liang22a/liang22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-liang22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Feynman family: Liang - given: Michael family: Mahoney - given: Liam family: Hodgkinson editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13257-13270 id: liang22a issued: date-parts: - 2022 - 6 - 28 firstpage: 13257 lastpage: 13270 published: 2022-06-28 00:00:00 +0000 - title: 'Exploring and Exploiting Hubness Priors for High-Quality GAN Latent Sampling' abstract: 'Despite the extensive studies on Generative Adversarial Networks (GANs), how to reliably sample high-quality images from their latent spaces remains an under-explored topic. In this paper, we propose a novel GAN latent sampling method by exploring and exploiting the hubness priors of GAN latent distributions. Our key insight is that the high dimensionality of the GAN latent space will inevitably lead to the emergence of hub latents that usually have much larger sampling densities than other latents in the latent space. As a result, these hub latents are better trained and thus contribute more to the synthesis of high-quality images. Unlike the a posterior "cherry-picking", our method is highly efficient as it is an a priori method that identifies high-quality latents before the synthesis of images. Furthermore, we show that the well-known but purely empirical truncation trick is a naive approximation to the central clustering effect of hub latents, which not only uncovers the rationale of the truncation trick, but also indicates the superiority and fundamentality of our method. Extensive experimental results demonstrate the effectiveness of the proposed method. Our code is available at: https://github.com/Byronliang8/HubnessGANSampling.' volume: 162 URL: https://proceedings.mlr.press/v162/liang22b.html PDF: https://proceedings.mlr.press/v162/liang22b/liang22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-liang22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yuanbang family: Liang - given: Jing family: Wu - given: Yu-Kun family: Lai - given: Yipeng family: Qin editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13271-13284 id: liang22b issued: date-parts: - 2022 - 6 - 28 firstpage: 13271 lastpage: 13284 published: 2022-06-28 00:00:00 +0000 - title: 'Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks' abstract: 'In temporal-difference reinforcement learning algorithms, variance in value estimation can cause instability and overestimation of the maximal target value. Many algorithms have been proposed to reduce overestimation, including several recent ensemble methods, however none have shown success in sample-efficient learning through addressing estimation variance as the root cause of overestimation. In this paper, we propose MeanQ, a simple ensemble method that estimates target values as ensemble means. Despite its simplicity, MeanQ shows remarkable sample efficiency in experiments on the Atari Learning Environment benchmark. Importantly, we find that an ensemble of size 5 sufficiently reduces estimation variance to obviate the lagging target network, eliminating it as a source of bias and further gaining sample efficiency. We justify intuitively and empirically the design choices in MeanQ, including the necessity of independent experience sampling. On a set of 26 benchmark Atari environments, MeanQ outperforms all tested baselines, including the best available baseline, SUNRISE, at 100K interaction steps in 16/26 environments, and by 68% on average. MeanQ also outperforms Rainbow DQN at 500K steps in 21/26 environments, and by 49% on average, and achieves average human-level performance using 200K ($\pm$100K) interaction steps. Our implementation is available at https://github.com/indylab/MeanQ.' volume: 162 URL: https://proceedings.mlr.press/v162/liang22c.html PDF: https://proceedings.mlr.press/v162/liang22c/liang22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-liang22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Litian family: Liang - given: Yaosheng family: Xu - given: Stephen family: Mcaleer - given: Dailin family: Hu - given: Alexander family: Ihler - given: Pieter family: Abbeel - given: Roy family: Fox editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13285-13301 id: liang22c issued: date-parts: - 2022 - 6 - 28 firstpage: 13285 lastpage: 13301 published: 2022-06-28 00:00:00 +0000 - title: 'TSPipe: Learn from Teacher Faster with Pipelines' abstract: 'The teacher-student (TS) framework, training a (student) network by utilizing an auxiliary superior (teacher) network, has been adopted as a popular training paradigm in many machine learning schemes, since the seminal work—Knowledge distillation (KD) for model compression and transfer learning. Many recent self-supervised learning (SSL) schemes also adopt the TS framework, where teacher networks are maintained as the moving average of student networks, called the momentum networks. This paper presents TSPipe, a pipelined approach to accelerate the training process of any TS frameworks including KD and SSL. Under the observation that the teacher network does not need a backward pass, our main idea is to schedule the computation of the teacher and student network separately, and fully utilize the GPU during training by interleaving the computations of the two networks and relaxing their dependencies. In case the teacher network requires a momentum update, we use delayed parameter updates only on the teacher network to attain high model accuracy. Compared to existing pipeline parallelism schemes, which sacrifice either training throughput or model accuracy, TSPipe provides better performance trade-offs, achieving up to 12.15x higher throughput.' volume: 162 URL: https://proceedings.mlr.press/v162/lim22a.html PDF: https://proceedings.mlr.press/v162/lim22a/lim22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lim22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hwijoon family: Lim - given: Yechan family: Kim - given: Sukmin family: Yun - given: Jinwoo family: Shin - given: Dongsu family: Han editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13302-13312 id: lim22a issued: date-parts: - 2022 - 6 - 28 firstpage: 13302 lastpage: 13312 published: 2022-06-28 00:00:00 +0000 - title: 'Order Constraints in Optimal Transport' abstract: 'Optimal transport is a framework for comparing measures whereby a cost is incurred for transporting one measure to another. Recent works have aimed to improve optimal transport plans through the introduction of various forms of structure. We introduce novel order constraints into the optimal transport formulation to allow for the incorporation of structure. We define an efficient method for obtaining explainable solutions to the new formulation that scales far better than standard approaches. The theoretical properties of the method are provided. We demonstrate experimentally that order constraints improve explainability using the e-SNLI (Stanford Natural Language Inference) dataset that includes human-annotated rationales as well as on several image color transfer examples.' volume: 162 URL: https://proceedings.mlr.press/v162/lim22b.html PDF: https://proceedings.mlr.press/v162/lim22b/lim22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lim22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yu Chin Fabian family: Lim - given: Laura family: Wynter - given: Shiau Hong family: Lim editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13313-13333 id: lim22b issued: date-parts: - 2022 - 6 - 28 firstpage: 13313 lastpage: 13333 published: 2022-06-28 00:00:00 +0000 - title: 'Flow-Guided Sparse Transformer for Video Deblurring' abstract: 'Exploiting similar and sharper scene patches in spatio-temporal neighborhoods is critical for video deblurring. However, CNN-based methods show limitations in capturing long-range dependencies and modeling non-local self-similarity. In this paper, we propose a novel framework, Flow-Guided Sparse Transformer (FGST), for video deblurring. In FGST, we customize a self-attention module, Flow-Guided Sparse Window-based Multi-head Self-Attention (FGSW-MSA). For each $query$ element on the blurry reference frame, FGSW-MSA enjoys the guidance of the estimated optical flow to globally sample spatially sparse yet highly related $key$ elements corresponding to the same scene patch in neighboring frames. Besides, we present a Recurrent Embedding (RE) mechanism to transfer information from past frames and strengthen long-range temporal dependencies. Comprehensive experiments demonstrate that our proposed FGST outperforms state-of-the-art (SOTA) methods on both DVD and GOPRO datasets and yields visually pleasant results in real video deblurring. https://github.com/linjing7/VR-Baseline' volume: 162 URL: https://proceedings.mlr.press/v162/lin22a.html PDF: https://proceedings.mlr.press/v162/lin22a/lin22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lin22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jing family: Lin - given: Yuanhao family: Cai - given: Xiaowan family: Hu - given: Haoqian family: Wang - given: Youliang family: Yan - given: Xueyi family: Zou - given: Henghui family: Ding - given: Yulun family: Zhang - given: Radu family: Timofte - given: Luc family: Van Gool editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13334-13343 id: lin22a issued: date-parts: - 2022 - 6 - 28 firstpage: 13334 lastpage: 13343 published: 2022-06-28 00:00:00 +0000 - title: 'Federated Learning with Positive and Unlabeled Data' abstract: 'We study the problem of learning from positive and unlabeled (PU) data in the federated setting, where each client only labels a little part of their dataset due to the limitation of resources and time. Different from the settings in traditional PU learning where the negative class consists of a single class, the negative samples which cannot be identified by a client in the federated setting may come from multiple classes which are unknown to the client. Therefore, existing PU learning methods can be hardly applied in this situation. To address this problem, we propose a novel framework, namely Federated learning with Positive and Unlabeled data (FedPU), to minimize the expected risk of multiple negative classes by leveraging the labeled data in other clients. We theoretically analyze the generalization bound of the proposed FedPU. Empirical experiments show that the FedPU can achieve much better performance than conventional supervised and semi-supervised federated learning methods.' volume: 162 URL: https://proceedings.mlr.press/v162/lin22b.html PDF: https://proceedings.mlr.press/v162/lin22b/lin22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lin22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xinyang family: Lin - given: Hanting family: Chen - given: Yixing family: Xu - given: Chao family: Xu - given: Xiaolin family: Gui - given: Yiping family: Deng - given: Yunhe family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13344-13355 id: lin22b issued: date-parts: - 2022 - 6 - 28 firstpage: 13344 lastpage: 13355 published: 2022-06-28 00:00:00 +0000 - title: 'Decentralized Online Convex Optimization in Networked Systems' abstract: 'We study the problem of networked online convex optimization, where each agent individually decides on an action at every time step and agents cooperatively seek to minimize the total global cost over a finite horizon. The global cost is made up of three types of local costs: convex node costs, temporal interaction costs, and spatial interaction costs. In deciding their individual action at each time, an agent has access to predictions of local cost functions for the next $k$ time steps in an $r$-hop neighborhood. Our work proposes a novel online algorithm, Localized Predictive Control (LPC), which generalizes predictive control to multi-agent systems. We show that LPC achieves a competitive ratio of $1 + \tilde{O}(\rho_T^k) + \tilde{O}(\rho_S^r)$ in an adversarial setting, where $\rho_T$ and $\rho_S$ are constants in $(0, 1)$ that increase with the relative strength of temporal and spatial interaction costs, respectively. This is the first competitive ratio bound on decentralized predictive control for networked online convex optimization. Further, we show that the dependence on $k$ and $r$ in our results is near optimal by lower bounding the competitive ratio of any decentralized online algorithm.' volume: 162 URL: https://proceedings.mlr.press/v162/lin22c.html PDF: https://proceedings.mlr.press/v162/lin22c/lin22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lin22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yiheng family: Lin - given: Judy family: Gan - given: Guannan family: Qu - given: Yash family: Kanoria - given: Adam family: Wierman editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13356-13393 id: lin22c issued: date-parts: - 2022 - 6 - 28 firstpage: 13356 lastpage: 13393 published: 2022-06-28 00:00:00 +0000 - title: 'Unsupervised Flow-Aligned Sequence-to-Sequence Learning for Video Restoration' abstract: 'How to properly model the inter-frame relation within the video sequence is an important but unsolved challenge for video restoration (VR). In this work, we propose an unsupervised flow-aligned sequence-to-sequence model (S2SVR) to address this problem. On the one hand, the sequence-to-sequence model, which has proven capable of sequence modeling in the field of natural language processing, is explored for the first time in VR. Optimized serialization modeling shows potential in capturing long-range dependencies among frames. On the other hand, we equip the sequence-to-sequence model with an unsupervised optical flow estimator to maximize its potential. The flow estimator is trained with our proposed unsupervised distillation loss, which can alleviate the data discrepancy and inaccurate degraded optical flow issues of previous flow-based methods. With reliable optical flow, we can establish accurate correspondence among multiple frames, narrowing the domain difference between 1D language and 2D misaligned frames and improving the potential of the sequence-to-sequence model. S2SVR shows superior performance in multiple VR tasks, including video deblurring, video super-resolution, and compressed video quality enhancement. https://github.com/linjing7/VR-Baseline' volume: 162 URL: https://proceedings.mlr.press/v162/lin22d.html PDF: https://proceedings.mlr.press/v162/lin22d/lin22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lin22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jing family: Lin - given: Xiaowan family: Hu - given: Yuanhao family: Cai - given: Haoqian family: Wang - given: Youliang family: Yan - given: Xueyi family: Zou - given: Yulun family: Zhang - given: Luc family: Van Gool editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13394-13404 id: lin22d issued: date-parts: - 2022 - 6 - 28 firstpage: 13394 lastpage: 13404 published: 2022-06-28 00:00:00 +0000 - title: 'Constrained Gradient Descent: A Powerful and Principled Evasion Attack Against Neural Networks' abstract: 'We propose new, more efficient targeted white-box attacks against deep neural networks. Our attacks better align with the attacker’s goal: (1) tricking a model to assign higher probability to the target class than to any other class, while (2) staying within an $\epsilon$-distance of the attacked input. First, we demonstrate a loss function that explicitly encodes (1) and show that Auto-PGD finds more attacks with it. Second, we propose a new attack method, Constrained Gradient Descent (CGD), using a refinement of our loss function that captures both (1) and (2). CGD seeks to satisfy both attacker objectives—misclassification and bounded $\ell_{p}$-norm—in a principled manner, as part of the optimization, instead of via ad hoc post-processing techniques (e.g., projection or clipping). We show that CGD is more successful on CIFAR10 (0.9–4.2%) and ImageNet (8.6–13.6%) than state-of-the-art attacks while consuming less time (11.4–18.8%). Statistical tests confirm that our attack outperforms others against leading defenses on different datasets and values of $\epsilon$.' volume: 162 URL: https://proceedings.mlr.press/v162/lin22e.html PDF: https://proceedings.mlr.press/v162/lin22e/lin22e.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lin22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Weiran family: Lin - given: Keane family: Lucas - given: Lujo family: Bauer - given: Michael K. family: Reiter - given: Mahmood family: Sharif editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13405-13430 id: lin22e issued: date-parts: - 2022 - 6 - 28 firstpage: 13405 lastpage: 13430 published: 2022-06-28 00:00:00 +0000 - title: 'Learning Augmented Binary Search Trees' abstract: 'A treap is a classic randomized binary search tree data structure that is easy to implement and supports O(log n) expected time access. However, classic treaps do not take advantage of the input distribution or patterns in the input. Given recent advances in algorithms with predictions, we propose pairing treaps with machine advice to form a learning-augmented treap. We are the first to propose a learning-augmented data structure that supports binary search tree operations such as range-query and successor functionalities. With the assumption that we have access to advice from a frequency estimation oracle, we assign learned priorities to the nodes to better improve the treap’s structure. We theoretically analyze the learning-augmented treap’s performance under various input distributions and show that under those circumstances, our learning-augmented treap has stronger guarantees than classic treaps and other classic tree-based data structures. Further, we experimentally evaluate our learned treap on synthetic datasets and demonstrate a performance advantage over other search tree data structures. We also present experiments on real world datasets with known frequency estimation oracles and show improvements as well.' volume: 162 URL: https://proceedings.mlr.press/v162/lin22f.html PDF: https://proceedings.mlr.press/v162/lin22f/lin22f.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lin22f.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Honghao family: Lin - given: Tian family: Luo - given: David family: Woodruff editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13431-13440 id: lin22f issued: date-parts: - 2022 - 6 - 28 firstpage: 13431 lastpage: 13440 published: 2022-06-28 00:00:00 +0000 - title: 'Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback' abstract: 'Motivated by applications to online learning in sparse estimation and Bayesian optimization, we consider the problem of online unconstrained nonsubmodular minimization with delayed costs in both full information and bandit feedback settings. In contrast to previous works on online unconstrained submodular minimization, we focus on a class of nonsubmodular functions with special structure, and prove regret guarantees for several variants of the online and approximate online bandit gradient descent algorithms in static and delayed scenarios. We derive bounds for the agent’s regret in the full information and bandit feedback setting, even if the delay between choosing a decision and receiving the incurred cost is unbounded. Key to our approach is the notion of $(\alpha, \beta)$-regret and the extension of the generic convex relaxation model from \citet{El-2020-Optimal}, the analysis of which is of independent interest. We conduct and showcase several simulation studies to demonstrate the efficacy of our algorithms.' volume: 162 URL: https://proceedings.mlr.press/v162/lin22g.html PDF: https://proceedings.mlr.press/v162/lin22g/lin22g.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lin22g.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tianyi family: Lin - given: Aldo family: Pacchiano - given: Yaodong family: Yu - given: Michael family: Jordan editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13441-13467 id: lin22g issued: date-parts: - 2022 - 6 - 28 firstpage: 13441 lastpage: 13467 published: 2022-06-28 00:00:00 +0000 - title: 'Measuring the Effect of Training Data on Deep Learning Predictions via Randomized Experiments' abstract: 'We develop a new, principled algorithm for estimating the contribution of training data points to the behavior of a deep learning model, such as a specific prediction it makes. Our algorithm estimates the AME, a quantity that measures the expected (average) marginal effect of adding a data point to a subset of the training data, sampled from a given distribution. When subsets are sampled from the uniform distribution, the AME reduces to the well-known Shapley value. Our approach is inspired by causal inference and randomized experiments: we sample different subsets of the training data to train multiple submodels, and evaluate each submodel’s behavior. We then use a LASSO regression to jointly estimate the AME of each data point, based on the subset compositions. Under sparsity assumptions ($k \ll N$ datapoints have large AME), our estimator requires only $O(k\log N)$ randomized submodel trainings, improving upon the best prior Shapley value estimators.' volume: 162 URL: https://proceedings.mlr.press/v162/lin22h.html PDF: https://proceedings.mlr.press/v162/lin22h/lin22h.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lin22h.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jinkun family: Lin - given: Anqi family: Zhang - given: Mathias family: Lécuyer - given: Jinyang family: Li - given: Aurojit family: Panda - given: Siddhartha family: Sen editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13468-13504 id: lin22h issued: date-parts: - 2022 - 6 - 28 firstpage: 13468 lastpage: 13504 published: 2022-06-28 00:00:00 +0000 - title: 'Interactively Learning Preference Constraints in Linear Bandits' abstract: 'We study sequential decision-making with known rewards and unknown constraints, motivated by situations where the constraints represent expensive-to-evaluate human preferences, such as safe and comfortable driving behavior. We formalize the challenge of interactively learning about these constraints as a novel linear bandit problem which we call constrained linear best-arm identification. To solve this problem, we propose the Adaptive Constraint Learning (ACOL) algorithm. We provide an instance-dependent lower bound for constrained linear best-arm identification and show that ACOL’s sample complexity matches the lower bound in the worst-case. In the average case, ACOL’s sample complexity bound is still significantly tighter than bounds of simpler approaches. In synthetic experiments, ACOL performs on par with an oracle solution and outperforms a range of baselines. As an application, we consider learning constraints to represent human preferences in a driving simulation. ACOL is significantly more sample efficient than alternatives for this application. Further, we find that learning preferences as constraints is more robust to changes in the driving scenario than encoding the preferences directly in the reward function.' volume: 162 URL: https://proceedings.mlr.press/v162/lindner22a.html PDF: https://proceedings.mlr.press/v162/lindner22a/lindner22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lindner22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: David family: Lindner - given: Sebastian family: Tschiatschek - given: Katja family: Hofmann - given: Andreas family: Krause editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13505-13527 id: lindner22a issued: date-parts: - 2022 - 6 - 28 firstpage: 13505 lastpage: 13527 published: 2022-06-28 00:00:00 +0000 - title: 'Delayed Reinforcement Learning by Imitation' abstract: 'When the agent’s observations or interactions are delayed, classic reinforcement learning tools usually fail. In this paper, we propose a simple yet new and efficient solution to this problem. We assume that, in the undelayed environment, an efficient policy is known or can be easily learnt, but the task may suffer from delays in practice and we thus want to take them into account. We present a novel algorithm, Delayed Imitation with Dataset Aggregation (DIDA), which builds upon imitation learning methods to learn how to act in a delayed environment from undelayed demonstrations. We provide a theoretical analysis of the approach that will guide the practical design of DIDA. These results are also of general interest in the delayed reinforcement learning literature by providing bounds on the performance between delayed and undelayed tasks, under smoothness conditions. We show empirically that DIDA obtains high performances with a remarkable sample efficiency on a variety of tasks, including robotic locomotion, classic control, and trading.' volume: 162 URL: https://proceedings.mlr.press/v162/liotet22a.html PDF: https://proceedings.mlr.press/v162/liotet22a/liotet22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-liotet22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Pierre family: Liotet - given: Davide family: Maran - given: Lorenzo family: Bisi - given: Marcello family: Restelli editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13528-13556 id: liotet22a issued: date-parts: - 2022 - 6 - 28 firstpage: 13528 lastpage: 13556 published: 2022-06-28 00:00:00 +0000 - title: 'CITRIS: Causal Identifiability from Temporal Intervened Sequences' abstract: 'Understanding the latent causal factors of a dynamical system from visual observations is considered a crucial step towards agents reasoning in complex environments. In this paper, we propose CITRIS, a variational autoencoder framework that learns causal representations from temporal sequences of images in which underlying causal factors have possibly been intervened upon. In contrast to the recent literature, CITRIS exploits temporality and observing intervention targets to identify scalar and multidimensional causal factors, such as 3D rotation angles. Furthermore, by introducing a normalizing flow, CITRIS can be easily extended to leverage and disentangle representations obtained by already pretrained autoencoders. Extending previous results on scalar causal factors, we prove identifiability in a more general setting, in which only some components of a causal factor are affected by interventions. In experiments on 3D rendered image sequences, CITRIS outperforms previous methods on recovering the underlying causal variables. Moreover, using pretrained autoencoders, CITRIS can even generalize to unseen instantiations of causal factors, opening future research areas in sim-to-real generalization for causal representation learning.' volume: 162 URL: https://proceedings.mlr.press/v162/lippe22a.html PDF: https://proceedings.mlr.press/v162/lippe22a/lippe22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lippe22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Phillip family: Lippe - given: Sara family: Magliacane - given: Sindy family: Löwe - given: Yuki M family: Asano - given: Taco family: Cohen - given: Stratis family: Gavves editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13557-13603 id: lippe22a issued: date-parts: - 2022 - 6 - 28 firstpage: 13557 lastpage: 13603 published: 2022-06-28 00:00:00 +0000 - title: 'StreamingQA: A Benchmark for Adaptation to New Knowledge over Time in Question Answering Models' abstract: 'Knowledge and language understanding of models evaluated through question answering (QA) has been usually studied on static snapshots of knowledge, like Wikipedia. However, our world is dynamic, evolves over time, and our models’ knowledge becomes outdated. To study how semi-parametric QA models and their underlying parametric language models (LMs) adapt to evolving knowledge, we construct a new large-scale dataset, StreamingQA, with human written and generated questions asked on a given date, to be answered from 14 years of time-stamped news articles. We evaluate our models quarterly as they read new articles not seen in pre-training. We show that parametric models can be updated without full retraining, while avoiding catastrophic forgetting. For semi-parametric models, adding new articles into the search space allows for rapid adaptation, however, models with an outdated underlying LM under-perform those with a retrained LM. For questions about higher-frequency named entities, parametric updates are particularly beneficial. In our dynamic world, the StreamingQA dataset enables a more realistic evaluation of QA models, and our experiments highlight several promising directions for future research.' volume: 162 URL: https://proceedings.mlr.press/v162/liska22a.html PDF: https://proceedings.mlr.press/v162/liska22a/liska22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-liska22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Adam family: Liska - given: Tomas family: Kocisky - given: Elena family: Gribovskaya - given: Tayfun family: Terzi - given: Eren family: Sezener - given: Devang family: Agrawal - given: Cyprien family: De Masson D’Autume - given: Tim family: Scholtes - given: Manzil family: Zaheer - given: Susannah family: Young - given: Ellen family: Gilsenan-Mcmahon - given: Sophia family: Austin - given: Phil family: Blunsom - given: Angeliki family: Lazaridou editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13604-13622 id: liska22a issued: date-parts: - 2022 - 6 - 28 firstpage: 13604 lastpage: 13622 published: 2022-06-28 00:00:00 +0000 - title: 'Distributionally Robust $Q$-Learning' abstract: 'Reinforcement learning (RL) has demonstrated remarkable achievements in simulated environments. However, carrying this success to real environments requires the important attribute of robustness, which the existing RL algorithms often lack as they assume that the future deployment environment is the same as the training environment (i.e. simulator) in which the policy is learned. This assumption often does not hold due to the discrepancy between the simulator and the real environment and, as a result, and hence renders the learned policy fragile when deployed. In this paper, we propose a novel distributionally robust $Q$-learning algorithm that learns the best policy in the worst distributional perturbation of the environment. Our algorithm first transforms the infinite-dimensional learning problem (since the environment MDP perturbation lies in an infinite-dimensional space) into a finite-dimensional dual problem and subsequently uses a multi-level Monte-Carlo scheme to approximate the dual value using samples from the simulator. Despite the complexity, we show that the resulting distributionally robust $Q$-learning algorithm asymptotically converges to optimal worst-case policy, thus making it robust to future environment changes. Simulation results further demonstrate its strong empirical robustness.' volume: 162 URL: https://proceedings.mlr.press/v162/liu22a.html PDF: https://proceedings.mlr.press/v162/liu22a/liu22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-liu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zijian family: Liu - given: Qinxun family: Bai - given: Jose family: Blanchet - given: Perry family: Dong - given: Wei family: Xu - given: Zhengqing family: Zhou - given: Zhengyuan family: Zhou editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13623-13643 id: liu22a issued: date-parts: - 2022 - 6 - 28 firstpage: 13623 lastpage: 13643 published: 2022-06-28 00:00:00 +0000 - title: 'Constrained Variational Policy Optimization for Safe Reinforcement Learning' abstract: 'Safe reinforcement learning (RL) aims to learn policies that satisfy certain constraints before deploying them to safety-critical applications. Previous primal-dual style approaches suffer from instability issues and lack optimality guarantees. This paper overcomes the issues from the perspective of probabilistic inference. We introduce a novel Expectation-Maximization approach to naturally incorporate constraints during the policy learning: 1) a provable optimal non-parametric variational distribution could be computed in closed form after a convex optimization (E-step); 2) the policy parameter is improved within the trust region based on the optimal variational distribution (M-step). The proposed algorithm decomposes the safe RL problem into a convex optimization phase and a supervised learning phase, which yields a more stable training performance. A wide range of experiments on continuous robotic tasks shows that the proposed method achieves significantly better constraint satisfaction performance and better sample efficiency than baselines. The code is available at https://github.com/liuzuxin/cvpo-safe-rl.' volume: 162 URL: https://proceedings.mlr.press/v162/liu22b.html PDF: https://proceedings.mlr.press/v162/liu22b/liu22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-liu22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zuxin family: Liu - given: Zhepeng family: Cen - given: Vladislav family: Isenbaev - given: Wei family: Liu - given: Steven family: Wu - given: Bo family: Li - given: Ding family: Zhao editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13644-13668 id: liu22b issued: date-parts: - 2022 - 6 - 28 firstpage: 13644 lastpage: 13668 published: 2022-06-28 00:00:00 +0000 - title: 'Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint' abstract: 'Overparameterized neural networks enjoy great representation power on complex data, and more importantly yield sufficiently smooth output, which is crucial to their generalization and robustness. Most existing function approximation theories suggest that with sufficiently many parameters, neural networks can well approximate certain classes of functions in terms of the function value. The neural network themselves, however, can be highly nonsmooth. To bridge this gap, we take convolutional residual networks (ConvResNets) as an example, and prove that large ConvResNets can not only approximate a target function in terms of function value, but also exhibit sufficient first-order smoothness. Moreover, we extend our theory to approximating functions supported on a low-dimensional manifold. Our theory partially justifies the benefits of using deep and wide networks in practice. Numerical experiments on adversarial robust image classification are provided to support our theory.' volume: 162 URL: https://proceedings.mlr.press/v162/liu22c.html PDF: https://proceedings.mlr.press/v162/liu22c/liu22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-liu22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hao family: Liu - given: Minshuo family: Chen - given: Siawpeng family: Er - given: Wenjing family: Liao - given: Tong family: Zhang - given: Tuo family: Zhao editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13669-13703 id: liu22c issued: date-parts: - 2022 - 6 - 28 firstpage: 13669 lastpage: 13703 published: 2022-06-28 00:00:00 +0000 - title: 'Boosting Graph Structure Learning with Dummy Nodes' abstract: 'With the development of graph kernels and graph representation learning, many superior methods have been proposed to handle scalability and oversmoothing issues on graph structure learning. However, most of those strategies are designed based on practical experience rather than theoretical analysis. In this paper, we use a particular dummy node connecting to all existing vertices without affecting original vertex and edge properties. We further prove that such the dummy node can help build an efficient monomorphic edge-to-vertex transform and an epimorphic inverse to recover the original graph back. It also indicates that adding dummy nodes can preserve local and global structures for better graph representation learning. We extend graph kernels and graph neural networks with dummy nodes and conduct experiments on graph classification and subgraph isomorphism matching tasks. Empirical results demonstrate that taking graphs with dummy nodes as input significantly boosts graph structure learning, and using their edge-to-vertex graphs can also achieve similar results. We also discuss the gain of expressive power from the dummy in neural networks.' volume: 162 URL: https://proceedings.mlr.press/v162/liu22d.html PDF: https://proceedings.mlr.press/v162/liu22d/liu22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-liu22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xin family: Liu - given: Jiayang family: Cheng - given: Yangqiu family: Song - given: Xin family: Jiang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13704-13716 id: liu22d issued: date-parts: - 2022 - 6 - 28 firstpage: 13704 lastpage: 13716 published: 2022-06-28 00:00:00 +0000 - title: 'Equivalence Analysis between Counterfactual Regret Minimization and Online Mirror Descent' abstract: 'Follow-the-Regularized-Leader (FTRL) and Online Mirror Descent (OMD) are regret minimization algorithms for Online Convex Optimization (OCO), they are mathematically elegant but less practical in solving Extensive-Form Games (EFGs). Counterfactual Regret Minimization (CFR) is a technique for approximating Nash equilibria in EFGs. CFR and its variants have a fast convergence rate in practice, but their theoretical results are not satisfactory. In recent years, researchers have been trying to link CFRs with OCO algorithms, which may provide new theoretical results and inspire new algorithms. However, existing analysis is restricted to local decision points. In this paper, we show that CFRs with Regret Matching and Regret Matching+ are equivalent to special cases of FTRL and OMD, respectively. According to these equivalences, a new FTRL and a new OMD algorithm, which can be considered as extensions of vanilla CFR and CFR+, are derived. The experimental results show that the two variants converge faster than conventional FTRL and OMD, even faster than vanilla CFR and CFR+ in some EFGs.' volume: 162 URL: https://proceedings.mlr.press/v162/liu22e.html PDF: https://proceedings.mlr.press/v162/liu22e/liu22e.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-liu22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Weiming family: Liu - given: Huacong family: Jiang - given: Bin family: Li - given: Houqiang family: Li editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13717-13745 id: liu22e issued: date-parts: - 2022 - 6 - 28 firstpage: 13717 lastpage: 13745 published: 2022-06-28 00:00:00 +0000 - title: 'Deep Probability Estimation' abstract: 'Reliable probability estimation is of crucial importance in many real-world applications where there is inherent (aleatoric) uncertainty. Probability-estimation models are trained on observed outcomes (e.g. whether it has rained or not, or whether a patient has died or not), because the ground-truth probabilities of the events of interest are typically unknown. The problem is therefore analogous to binary classification, with the difference that the objective is to estimate probabilities rather than predicting the specific outcome. This work investigates probability estimation from high-dimensional data using deep neural networks. There exist several methods to improve the probabilities generated by these models but they mostly focus on model (epistemic) uncertainty. For problems with inherent uncertainty, it is challenging to evaluate performance without access to ground-truth probabilities. To address this, we build a synthetic dataset to study and compare different computable metrics. We evaluate existing methods on the synthetic data as well as on three real-world probability estimation tasks, all of which involve inherent uncertainty: precipitation forecasting from radar images, predicting cancer patient survival from histopathology images, and predicting car crashes from dashcam videos. We also give a theoretical analysis of a model for high-dimensional probability estimation which reproduces several of the phenomena evinced in our experiments. Finally, we propose a new method for probability estimation using neural networks, which modifies the training process to promote output probabilities that are consistent with empirical probabilities computed from the data. The method outperforms existing approaches on most metrics on the simulated as well as real-world data.' volume: 162 URL: https://proceedings.mlr.press/v162/liu22f.html PDF: https://proceedings.mlr.press/v162/liu22f/liu22f.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-liu22f.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sheng family: Liu - given: Aakash family: Kaku - given: Weicheng family: Zhu - given: Matan family: Leibovich - given: Sreyas family: Mohan - given: Boyang family: Yu - given: Haoxiang family: Huang - given: Laure family: Zanna - given: Narges family: Razavian - given: Jonathan family: Niles-Weed - given: Carlos family: Fernandez-Granda editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13746-13781 id: liu22f issued: date-parts: - 2022 - 6 - 28 firstpage: 13746 lastpage: 13781 published: 2022-06-28 00:00:00 +0000 - title: 'Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers' abstract: 'Sparsely activated transformers, such as Mixture of Experts (MoE), have received great interest due to their outrageous scaling capability which enables dramatical increases in model size without significant increases in computational cost. To achieve this, MoE models replace the feedforward sub-layer with Mixture-of-Experts sub-layer in transformers and use a gating network to route each token to its assigned experts. Since the common practice for efficient training of such models requires distributing experts and tokens across different machines, this routing strategy often incurs huge cross-machine communication cost because tokens and their assigned experts likely reside in different machines. In this paper, we propose Gating Dropout, which allows tokens to ignore the gating network and stay at their local machines, thus reducing the cross-machine communication. Similar to traditional dropout, we also show that Gating Dropout has a regularization effect during training, resulting in improved generalization performance. We validate the effectiveness of Gating Dropout on multilingual machine translation tasks. Our results demonstrate that Gating Dropout improves a state-of-the-art MoE model with faster wall-clock time convergence rates and better BLEU scores for a variety of model sizes and datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/liu22g.html PDF: https://proceedings.mlr.press/v162/liu22g/liu22g.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-liu22g.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Rui family: Liu - given: Young Jin family: Kim - given: Alexandre family: Muzio - given: Hany family: Hassan editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13782-13792 id: liu22g issued: date-parts: - 2022 - 6 - 28 firstpage: 13782 lastpage: 13792 published: 2022-06-28 00:00:00 +0000 - title: 'Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games' abstract: 'Learning to play optimally against any mixture over a diverse set of strategies is of important practical interests in competitive games. In this paper, we propose simplex-NeuPL that satisfies two desiderata simultaneously: i) learning a population of strategically diverse basis policies, represented by a single conditional network; ii) using the same network, learn best-responses to any mixture over the simplex of basis policies. We show that the resulting conditional policies incorporate prior information about their opponents effectively, enabling near optimal returns against arbitrary mixture policies in a game with tractable best-responses. We verify that such policies behave Bayes-optimally under uncertainty and offer insights in using this flexibility at test time. Finally, we offer evidence that learning best-responses to any mixture policies is an effective auxiliary task for strategic exploration, which, by itself, can lead to more performant populations.' volume: 162 URL: https://proceedings.mlr.press/v162/liu22h.html PDF: https://proceedings.mlr.press/v162/liu22h/liu22h.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-liu22h.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Siqi family: Liu - given: Marc family: Lanctot - given: Luke family: Marris - given: Nicolas family: Heess editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13793-13806 id: liu22h issued: date-parts: - 2022 - 6 - 28 firstpage: 13793 lastpage: 13806 published: 2022-06-28 00:00:00 +0000 - title: 'Rethinking Attention-Model Explainability through Faithfulness Violation Test' abstract: 'Attention mechanisms are dominating the explainability of deep models. They produce probability distributions over the input, which are widely deemed as feature-importance indicators. However, in this paper, we find one critical limitation in attention explanations: weakness in identifying the polarity of feature impact. This would be somehow misleading – features with higher attention weights may not faithfully contribute to model predictions; instead, they can impose suppression effects. With this finding, we reflect on the explainability of current attention-based techniques, such as Attention $\bigodot$ Gradient and LRP-based attention explanations. We first propose an actionable diagnostic methodology (henceforth faithfulness violation test) to measure the consistency between explanation weights and the impact polarity. Through the extensive experiments, we then show that most tested explanation methods are unexpectedly hindered by the faithfulness violation issue, especially the raw attention. Empirical analyses on the factors affecting violation issues further provide useful observations for adopting explanation methods in attention models.' volume: 162 URL: https://proceedings.mlr.press/v162/liu22i.html PDF: https://proceedings.mlr.press/v162/liu22i/liu22i.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-liu22i.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yibing family: Liu - given: Haoliang family: Li - given: Yangyang family: Guo - given: Chenqi family: Kong - given: Jing family: Li - given: Shiqi family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13807-13824 id: liu22i issued: date-parts: - 2022 - 6 - 28 firstpage: 13807 lastpage: 13824 published: 2022-06-28 00:00:00 +0000 - title: 'Optimization-Derived Learning with Essential Convergence Analysis of Training and Hyper-training' abstract: 'Recently, Optimization-Derived Learning (ODL) has attracted attention from learning and vision areas, which designs learning models from the perspective of optimization. However, previous ODL approaches regard the training and hyper-training procedures as two separated stages, meaning that the hyper-training variables have to be fixed during the training process, and thus it is also impossible to simultaneously obtain the convergence of training and hyper-training variables. In this work, we design a Generalized Krasnoselskii-Mann (GKM) scheme based on fixed-point iterations as our fundamental ODL module, which unifies existing ODL methods as special cases. Under the GKM scheme, a Bilevel Meta Optimization (BMO) algorithmic framework is constructed to solve the optimal training and hyper-training variables together. We rigorously prove the essential joint convergence of the fixed-point iteration for training and the process of optimizing hyper-parameters for hyper-training, both on the approximation quality, and on the stationary analysis. Experiments demonstrate the efficiency of BMO with competitive performance on sparse coding and real-world applications such as image deconvolution and rain streak removal.' volume: 162 URL: https://proceedings.mlr.press/v162/liu22j.html PDF: https://proceedings.mlr.press/v162/liu22j/liu22j.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-liu22j.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Risheng family: Liu - given: Xuan family: Liu - given: Shangzhi family: Zeng - given: Jin family: Zhang - given: Yixuan family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13825-13856 id: liu22j issued: date-parts: - 2022 - 6 - 28 firstpage: 13825 lastpage: 13856 published: 2022-06-28 00:00:00 +0000 - title: 'Deep Neural Network Fusion via Graph Matching with Applications to Model Ensemble and Federated Learning' abstract: 'Model fusion without accessing training data in machine learning has attracted increasing interest due to the practical resource-saving and data privacy issues. During the training process, the neural weights of each model can be randomly permuted, and we have to align the channels of each layer before fusing them. Regrading the channels as nodes and weights as edges, aligning the channels to maximize weight similarity is a challenging NP-hard assignment problem. Due to its quadratic assignment nature, we formulate the model fusion problem as a graph matching task, considering the second-order similarity of model weights instead of previous work merely formulating model fusion as a linear assignment problem. For the rising problem scale and multi-model consistency issues, we propose an efficient graduated assignment-based model fusion method, dubbed GAMF, which iteratively updates the matchings in a consistency-maintaining manner. We apply GAMF to tackle the compact model ensemble task and federated learning task on MNIST, CIFAR-10, CIFAR-100, and Tiny-Imagenet. The performance shows the efficacy of our GAMF compared to state-of-the-art baselines.' volume: 162 URL: https://proceedings.mlr.press/v162/liu22k.html PDF: https://proceedings.mlr.press/v162/liu22k/liu22k.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-liu22k.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Chang family: Liu - given: Chenfei family: Lou - given: Runzhong family: Wang - given: Alan Yuhan family: Xi - given: Li family: Shen - given: Junchi family: Yan editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13857-13869 id: liu22k issued: date-parts: - 2022 - 6 - 28 firstpage: 13857 lastpage: 13869 published: 2022-06-28 00:00:00 +0000 - title: 'Welfare Maximization in Competitive Equilibrium: Reinforcement Learning for Markov Exchange Economy' abstract: 'We study a bilevel economic system, which we refer to as a Markov exchange economy (MEE), from the point of view of multi-agent reinforcement learning (MARL). An MEE involves a central planner and a group of self-interested agents. The goal of the agents is to form a Competitive Equilibrium (CE), where each agent myopically maximizes her own utility at each step. The goal of the central planner is to steer the system so as to maximize social welfare, which is defined as the sum of the utilities of all agents. Working in a setting in which the utility function and the system dynamics are both unknown, we propose to find the socially optimal policy and the CE from data via both online and offline variants of MARL. Concretely, we first devise a novel suboptimality metric specifically tailored to MEE, such that minimizing such a metric certifies globally optimal policies for both the planner and the agents. Second, in the online setting, we propose an algorithm, dubbed as \texttt{MOLM}, which combines the optimism principle for exploration with subgame CE seeking. Our algorithm can readily incorporate general function approximation tools for handling large state spaces and achieves a sublinear regret. Finally, we adapt the algorithm to an offline setting based on the pessimism principle and establish an upper bound on the suboptimality.' volume: 162 URL: https://proceedings.mlr.press/v162/liu22l.html PDF: https://proceedings.mlr.press/v162/liu22l/liu22l.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-liu22l.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhihan family: Liu - given: Miao family: Lu - given: Zhaoran family: Wang - given: Michael family: Jordan - given: Zhuoran family: Yang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13870-13911 id: liu22l issued: date-parts: - 2022 - 6 - 28 firstpage: 13870 lastpage: 13911 published: 2022-06-28 00:00:00 +0000 - title: 'Generating 3D Molecules for Target Protein Binding' abstract: 'A fundamental problem in drug discovery is to design molecules that bind to specific proteins. To tackle this problem using machine learning methods, here we propose a novel and effective framework, known as GraphBP, to generate 3D molecules that bind to given proteins by placing atoms of specific types and locations to the given binding site one by one. In particular, at each step, we first employ a 3D graph neural network to obtain geometry-aware and chemically informative representations from the intermediate contextual information. Such context includes the given binding site and atoms placed in the previous steps. Second, to preserve the desirable equivariance property, we select a local reference atom according to the designed auxiliary classifiers and then construct a local spherical coordinate system. Finally, to place a new atom, we generate its atom type and relative location w.r.t. the constructed local coordinate system via a flow model. We also consider generating the variables of interest sequentially to capture the underlying dependencies among them. Experiments demonstrate that our GraphBP is effective to generate 3D molecules with binding ability to target protein binding sites. Our implementation is available at https://github.com/divelab/GraphBP.' volume: 162 URL: https://proceedings.mlr.press/v162/liu22m.html PDF: https://proceedings.mlr.press/v162/liu22m/liu22m.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-liu22m.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Meng family: Liu - given: Youzhi family: Luo - given: Kanji family: Uchino - given: Koji family: Maruhashi - given: Shuiwang family: Ji editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13912-13924 id: liu22m issued: date-parts: - 2022 - 6 - 28 firstpage: 13912 lastpage: 13924 published: 2022-06-28 00:00:00 +0000 - title: 'Communication-efficient Distributed Learning for Large Batch Optimization' abstract: 'Many communication-efficient methods have been proposed for distributed learning, whereby gradient compression is used to reduce the communication cost. However, given recent advances in large batch optimization (e.g., large batch SGD and its variant LARS with layerwise adaptive learning rates), the compute power of each machine is being fully utilized. This means, in modern distributed learning, the per-machine computation cost is no longer negligible compared to the communication cost. In this paper, we propose new gradient compression methods for large batch optimization, JointSpar and its variant JointSpar-LARS with layerwise adaptive learning rates, that jointly reduce both the computation and the communication cost. To achieve this, we take advantage of the redundancy in the gradient computation, unlike the existing methods compute all coordinates of the gradient vector, even if some coordinates are later dropped for communication efficiency. JointSpar and its variant further reduce the training time by avoiding the wasted computation on dropped coordinates. While computationally more efficient, we prove that JointSpar and its variant also maintain the same convergence rates as their respective baseline methods. Extensive experiments show that, by reducing the time per iteration, our methods converge faster than state-of-the-art compression methods in terms of wall-clock time.' volume: 162 URL: https://proceedings.mlr.press/v162/liu22n.html PDF: https://proceedings.mlr.press/v162/liu22n/liu22n.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-liu22n.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Rui family: Liu - given: Barzan family: Mozafari editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13925-13946 id: liu22n issued: date-parts: - 2022 - 6 - 28 firstpage: 13925 lastpage: 13946 published: 2022-06-28 00:00:00 +0000 - title: 'Adaptive Accelerated (Extra-)Gradient Methods with Variance Reduction' abstract: 'In this paper, we study the finite-sum convex optimization problem focusing on the general convex case. Recently, the study of variance reduced (VR) methods and their accelerated variants has made exciting progress. However, the step size used in the existing VR algorithms typically depends on the smoothness parameter, which is often unknown and requires tuning in practice. To address this problem, we propose two novel adaptive VR algorithms: Adaptive Variance Reduced Accelerated Extra-Gradient (AdaVRAE) and Adaptive Variance Reduced Accelerated Gradient (AdaVRAG). Our algorithms do not require knowledge of the smoothness parameter. AdaVRAE uses $\mathcal{O}\left(n\log\log n+\sqrt{\frac{n\beta}{\epsilon}}\right)$ and AdaVRAG uses $\mathcal{O}\left(n\log\log n+\sqrt{\frac{n\beta\log\beta}{\epsilon}}\right)$ gradient evaluations to attain an $\mathcal{O}(\epsilon)$-suboptimal solution, where $n$ is the number of functions in the finite sum and $\beta$ is the smoothness parameter. This result matches the best-known convergence rate of non-adaptive VR methods and it improves upon the convergence of the state of the art adaptive VR method, AdaSVRG. We demonstrate the superior performance of our algorithms compared with previous methods in experiments on real-world datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/liu22o.html PDF: https://proceedings.mlr.press/v162/liu22o/liu22o.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-liu22o.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zijian family: Liu - given: Ta Duy family: Nguyen - given: Alina family: Ene - given: Huy family: Nguyen editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13947-13994 id: liu22o issued: date-parts: - 2022 - 6 - 28 firstpage: 13947 lastpage: 13994 published: 2022-06-28 00:00:00 +0000 - title: 'REvolveR: Continuous Evolutionary Models for Robot-to-robot Policy Transfer' abstract: 'A popular paradigm in robotic learning is to train a policy from scratch for every new robot. This is not only inefficient but also often impractical for complex robots. In this work, we consider the problem of transferring a policy across two different robots with significantly different parameters such as kinematics and morphology. Existing approaches that train a new policy by matching the action or state transition distribution, including imitation learning methods, fail due to optimal action and/or state distribution being mismatched in different robots. In this paper, we propose a novel method named REvolveR of using continuous evolutionary models for robotic policy transfer implemented in a physics simulator. We interpolate between the source robot and the target robot by finding a continuous evolutionary change of robot parameters. An expert policy on the source robot is transferred through training on a sequence of intermediate robots that gradually evolve into the target robot. Experiments on a physics simulator show that the proposed continuous evolutionary model can effectively transfer the policy across robots and achieve superior sample efficiency on new robots. The proposed method is especially advantageous in sparse reward settings where exploration can be significantly reduced.' volume: 162 URL: https://proceedings.mlr.press/v162/liu22p.html PDF: https://proceedings.mlr.press/v162/liu22p/liu22p.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-liu22p.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xingyu family: Liu - given: Deepak family: Pathak - given: Kris family: Kitani editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 13995-14007 id: liu22p issued: date-parts: - 2022 - 6 - 28 firstpage: 13995 lastpage: 14007 published: 2022-06-28 00:00:00 +0000 - title: 'Kill a Bird with Two Stones: Closing the Convergence Gaps in Non-Strongly Convex Optimization by Directly Accelerated SVRG with Double Compensation and Snapshots' abstract: 'Recently, some accelerated stochastic variance reduction algorithms such as Katyusha and ASVRG-ADMM achieve faster convergence than non-accelerated methods such as SVRG and SVRG-ADMM. However, there are still some gaps between the oracle complexities and their lower bounds. To fill in these gaps, this paper proposes a novel Directly Accelerated stochastic Variance reductIon (DAVIS) algorithm with two Snapshots for non-strongly convex (non-SC) unconstrained problems. Our theoretical results show that DAVIS achieves the optimal convergence rate O(1/(nS^2)) and optimal gradient complexity O(n+\sqrt{nL/\epsilon}), which is identical to its lower bound. To the best of our knowledge, this is the first directly accelerated algorithm that attains the optimal lower bound and improves the convergence rate from O(1/S^2) to O(1/(nS^2)). Moreover, we extend DAVIS and theoretical results to non-SC problems with a structured regularizer, and prove that the proposed algorithm with double-snapshots also attains the optimal convergence rate O(1/(nS)) and optimal oracle complexity O(n+L/\epsilon) for such problems, and it is at least a factor n/S faster than existing accelerated stochastic algorithms, where n\gg S in general.' volume: 162 URL: https://proceedings.mlr.press/v162/liu22q.html PDF: https://proceedings.mlr.press/v162/liu22q/liu22q.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-liu22q.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yuanyuan family: Liu - given: Fanhua family: Shang - given: Weixin family: An - given: Hongying family: Liu - given: Zhouchen family: Lin editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14008-14035 id: liu22q issued: date-parts: - 2022 - 6 - 28 firstpage: 14008 lastpage: 14035 published: 2022-06-28 00:00:00 +0000 - title: 'Learning Markov Games with Adversarial Opponents: Efficient Algorithms and Fundamental Limits' abstract: 'An ideal strategy in zero-sum games should not only grant the player an average reward no less than the value of Nash equilibrium, but also exploit the (adaptive) opponents when they are suboptimal. While most existing works in Markov games focus exclusively on the former objective, it remains open whether we can achieve both objectives simultaneously. To address this problem, this work studies no-regret learning in Markov games with adversarial opponents when competing against the best fixed policy in hindsight. Along this direction, we present a new complete set of positive and negative results: When the policies of the opponents are revealed at the end of each episode, we propose new efficient algorithms achieving $\sqrt{K}$ regret bounds when either (1) the baseline policy class is small or (2) the opponent’s policy class is small. This is complemented with an exponential lower bound when neither conditions are true. When the policies of the opponents are not revealed, we prove a statistical hardness result even in the most favorable scenario when both above conditions are true. Our hardness result is much stronger than the existing hardness results which either only involve computational hardness, or require further restrictions on the algorithms.' volume: 162 URL: https://proceedings.mlr.press/v162/liu22r.html PDF: https://proceedings.mlr.press/v162/liu22r/liu22r.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-liu22r.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Qinghua family: Liu - given: Yuanhao family: Wang - given: Chi family: Jin editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14036-14053 id: liu22r issued: date-parts: - 2022 - 6 - 28 firstpage: 14036 lastpage: 14053 published: 2022-06-28 00:00:00 +0000 - title: 'Local Augmentation for Graph Neural Networks' abstract: 'Graph Neural Networks (GNNs) have achieved remarkable performance on graph-based tasks. The key idea for GNNs is to obtain informative representation through aggregating information from local neighborhoods. However, it remains an open question whether the neighborhood information is adequately aggregated for learning representations of nodes with few neighbors. To address this, we propose a simple and efficient data augmentation strategy, local augmentation, to learn the distribution of the node representations of the neighbors conditioned on the central node’s representation and enhance GNN’s expressive power with generated features. Local augmentation is a general framework that can be applied to any GNN model in a plug-and-play manner. It samples feature vectors associated with each node from the learned conditional distribution as additional input for the backbone model at each training iteration. Extensive experiments and analyses show that local augmentation consistently yields performance improvement when applied to various GNN architectures across a diverse set of benchmarks. For example, experiments show that plugging in local augmentation to GCN and GAT improves by an average of 3.4% and 1.6% in terms of test accuracy on Cora, Citeseer, and Pubmed. Besides, our experimental results on large graphs (OGB) show that our model consistently improves performance over backbones. Code is available at https://github.com/SongtaoLiu0823/LAGNN.' volume: 162 URL: https://proceedings.mlr.press/v162/liu22s.html PDF: https://proceedings.mlr.press/v162/liu22s/liu22s.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-liu22s.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Songtao family: Liu - given: Rex family: Ying - given: Hanze family: Dong - given: Lanqing family: Li - given: Tingyang family: Xu - given: Yu family: Rong - given: Peilin family: Zhao - given: Junzhou family: Huang - given: Dinghao family: Wu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14054-14072 id: liu22s issued: date-parts: - 2022 - 6 - 28 firstpage: 14054 lastpage: 14072 published: 2022-06-28 00:00:00 +0000 - title: 'Asking for Knowledge (AFK): Training RL Agents to Query External Knowledge Using Language' abstract: 'To solve difficult tasks, humans ask questions to acquire knowledge from external sources. In contrast, classical reinforcement learning agents lack such an ability and often resort to exploratory behavior. This is exacerbated as few present-day environments support querying for knowledge. In order to study how agents can be taught to query external knowledge via language, we first introduce two new environments: the grid-world-based Q-BabyAI and the text-based Q-TextWorld. In addition to physical interactions, an agent can query an external knowledge source specialized for these environments to gather information. Second, we propose the ‘Asking for Knowledge’ (AFK) agent, which learns to generate language commands to query for meaningful knowledge that helps solve the tasks. AFK leverages a non-parametric memory, a pointer mechanism and an episodic exploration bonus to tackle (1) irrelevant information, (2) a large query language space, (3) delayed reward for making meaningful queries. Extensive experiments demonstrate that the AFK agent outperforms recent baselines on the challenging Q-BabyAI and Q-TextWorld environments.' volume: 162 URL: https://proceedings.mlr.press/v162/liu22t.html PDF: https://proceedings.mlr.press/v162/liu22t/liu22t.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-liu22t.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Iou-Jen family: Liu - given: Xingdi family: Yuan - given: Marc-Alexandre family: Côté - given: Pierre-Yves family: Oudeyer - given: Alexander family: Schwing editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14073-14093 id: liu22t issued: date-parts: - 2022 - 6 - 28 firstpage: 14073 lastpage: 14093 published: 2022-06-28 00:00:00 +0000 - title: 'Learning from Demonstration: Provably Efficient Adversarial Policy Imitation with Linear Function Approximation' abstract: 'In generative adversarial imitation learning (GAIL), the agent aims to learn a policy from an expert demonstration so that its performance cannot be discriminated from the expert policy on a certain predefined reward set. In this paper, we study GAIL in both online and offline settings with linear function approximation, where both the transition and reward function are linear in the feature maps. Besides the expert demonstration, in the online setting the agent can interact with the environment, while in the offline setting the agent only accesses an additional dataset collected by a prior. For online GAIL, we propose an optimistic generative adversarial policy imitation algorithm (OGAPI) and prove that OGAPI achieves $\widetilde{\mathcal{O}}(\sqrt{H^4d^3K}+\sqrt{H^3d^2K^2/N_1})$ regret. Here $N_1$ represents the number of trajectories of the expert demonstration, $d$ is the feature dimension, and $K$ is the number of episodes. For offline GAIL, we propose a pessimistic generative adversarial policy imitation algorithm (PGAPI). We also obtain the optimality gap of PGAPI, achieving the minimax lower bound in the utilization of the additional dataset. Assuming sufficient coverage on the additional dataset, we show that PGAPI achieves $\widetilde{\mathcal{O}}(\sqrt{H^4d^2/K}+\sqrt{H^4d^3/N_2}+\sqrt{H^3d^2/N_1})$ optimality gap. Here $N_2$ represents the number of trajectories of the additional dataset with sufficient coverage.' volume: 162 URL: https://proceedings.mlr.press/v162/liu22u.html PDF: https://proceedings.mlr.press/v162/liu22u/liu22u.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-liu22u.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhihan family: Liu - given: Yufeng family: Zhang - given: Zuyue family: Fu - given: Zhuoran family: Yang - given: Zhaoran family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14094-14138 id: liu22u issued: date-parts: - 2022 - 6 - 28 firstpage: 14094 lastpage: 14138 published: 2022-06-28 00:00:00 +0000 - title: 'GACT: Activation Compressed Training for Generic Network Architectures' abstract: 'Training large neural network (NN) models requires extensive memory resources, and Activation Compression Training (ACT) is a promising approach to reduce training memory footprint. This paper presents GACT, an ACT framework to support a broad range of machine learning tasks for generic NN architectures with limited domain knowledge. By analyzing a linearized version of ACT’s approximate gradient, we prove the convergence of GACT without prior knowledge on operator type or model architecture. To make training stable, we propose an algorithm that decides the compression ratio for each tensor by estimating its impact on the gradient at run time. We implement GACT as a PyTorch library that readily applies to any NN architecture. GACT reduces the activation memory for convolutional NNs, transformers, and graph NNs by up to 8.1x, enabling training with a 4.2x to 24.7x larger batch size, with negligible accuracy loss.' volume: 162 URL: https://proceedings.mlr.press/v162/liu22v.html PDF: https://proceedings.mlr.press/v162/liu22v/liu22v.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-liu22v.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xiaoxuan family: Liu - given: Lianmin family: Zheng - given: Dequan family: Wang - given: Yukuo family: Cen - given: Weize family: Chen - given: Xu family: Han - given: Jianfei family: Chen - given: Zhiyuan family: Liu - given: Jie family: Tang - given: Joey family: Gonzalez - given: Michael family: Mahoney - given: Alvin family: Cheung editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14139-14152 id: liu22v issued: date-parts: - 2022 - 6 - 28 firstpage: 14139 lastpage: 14152 published: 2022-06-28 00:00:00 +0000 - title: 'Robust Training under Label Noise by Over-parameterization' abstract: 'Recently, over-parameterized deep networks, with increasingly more network parameters than training samples, have dominated the performances of modern machine learning. However, when the training data is corrupted, it has been well-known that over-parameterized networks tend to overfit and do not generalize. In this work, we propose a principled approach for robust training of over-parameterized deep networks in classification tasks where a proportion of training labels are corrupted. The main idea is yet very simple: label noise is sparse and incoherent with the network learned from clean data, so we model the noise and learn to separate it from the data. Specifically, we model the label noise via another sparse over-parameterization term, and exploit implicit algorithmic regularizations to recover and separate the underlying corruptions. Remarkably, when trained using such a simple method in practice, we demonstrate state-of-the-art test accuracy against label noise on a variety of real datasets. Furthermore, our experimental results are corroborated by theory on simplified linear models, showing that exact separation between sparse noise and low-rank data can be achieved under incoherent conditions. The work opens many interesting directions for improving over-parameterized models by using sparse over-parameterization and implicit regularization. Code is available at https://github.com/shengliu66/SOP.' volume: 162 URL: https://proceedings.mlr.press/v162/liu22w.html PDF: https://proceedings.mlr.press/v162/liu22w/liu22w.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-liu22w.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sheng family: Liu - given: Zhihui family: Zhu - given: Qing family: Qu - given: Chong family: You editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14153-14172 id: liu22w issued: date-parts: - 2022 - 6 - 28 firstpage: 14153 lastpage: 14172 published: 2022-06-28 00:00:00 +0000 - title: 'Plan Your Target and Learn Your Skills: Transferable State-Only Imitation Learning via Decoupled Policy Optimization' abstract: 'Recent progress in state-only imitation learning extends the scope of applicability of imitation learning to real-world settings by relieving the need for observing expert actions. However, existing solutions only learn to extract a state-to-action mapping policy from the data, without considering how the expert plans to the target. This hinders the ability to leverage demonstrations and limits the flexibility of the policy. In this paper, we introduce Decoupled Policy Optimization (DePO), which explicitly decouples the policy as a high-level state planner and an inverse dynamics model. With embedded decoupled policy gradient and generative adversarial training, DePO enables knowledge transfer to different action spaces or state transition dynamics, and can generalize the planner to out-of-demonstration state regions. Our in-depth experimental analysis shows the effectiveness of DePO on learning a generalized target state planner while achieving the best imitation performance. We demonstrate the appealing usage of DePO for transferring across different tasks by pre-training, and the potential for co-training agents with various skills.' volume: 162 URL: https://proceedings.mlr.press/v162/liu22x.html PDF: https://proceedings.mlr.press/v162/liu22x/liu22x.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-liu22x.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Minghuan family: Liu - given: Zhengbang family: Zhu - given: Yuzheng family: Zhuang - given: Weinan family: Zhang - given: Jianye family: Hao - given: Yong family: Yu - given: Jun family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14173-14196 id: liu22x issued: date-parts: - 2022 - 6 - 28 firstpage: 14173 lastpage: 14196 published: 2022-06-28 00:00:00 +0000 - title: 'On the Impossibility of Learning to Cooperate with Adaptive Partner Strategies in Repeated Games' abstract: 'Learning to cooperate with other agents is challenging when those agents also possess the ability to adapt to our own behavior. Practical and theoretical approaches to learning in cooperative settings typically assume that other agents’ behaviors are stationary, or else make very specific assumptions about other agents’ learning processes. The goal of this work is to understand whether we can reliably learn to cooperate with other agents without such restrictive assumptions, which are unlikely to hold in real-world applications. Our main contribution is a set of impossibility results, which show that no learning algorithm can reliably learn to cooperate with all possible adaptive partners in a repeated matrix game, even if that partner is guaranteed to cooperate with some stationary strategy. Motivated by these results, we then discuss potential alternative assumptions which capture the idea that an adaptive partner will only adapt rationally to our behavior.' volume: 162 URL: https://proceedings.mlr.press/v162/loftin22a.html PDF: https://proceedings.mlr.press/v162/loftin22a/loftin22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-loftin22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Robert family: Loftin - given: Frans A family: Oliehoek editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14197-14209 id: loftin22a issued: date-parts: - 2022 - 6 - 28 firstpage: 14197 lastpage: 14209 published: 2022-06-28 00:00:00 +0000 - title: 'AutoIP: A United Framework to Integrate Physics into Gaussian Processes' abstract: 'Physical modeling is critical for many modern science and engineering applications. From a data science or machine learning perspective, where more domain-agnostic, data-driven models are pervasive, physical knowledge {—} often expressed as differential equations {—} is valuable in that it is complementary to data, and it can potentially help overcome issues such as data sparsity, noise, and inaccuracy. In this work, we propose a simple, yet powerful and general framework {—} AutoIP, for Automatically Incorporating Physics {—} that can integrate all kinds of differential equations into Gaussian Processes (GPs) to enhance prediction accuracy and uncertainty quantification. These equations can be linear or nonlinear, spatial, temporal, or spatio-temporal, complete or incomplete with unknown source terms, and so on. Based on kernel differentiation, we construct a GP prior to sample the values of the target function, equation related derivatives, and latent source functions, which are all jointly from a multivariate Gaussian distribution. The sampled values are fed to two likelihoods: one to fit the observations, and the other to conform to the equation. We use the whitening method to evade the strong dependency between the sampled function values and kernel parameters, and we develop a stochastic variational learning algorithm. AutoIP shows improvement upon vanilla GPs in both simulation and several real-world applications, even using rough, incomplete equations.' volume: 162 URL: https://proceedings.mlr.press/v162/long22a.html PDF: https://proceedings.mlr.press/v162/long22a/long22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-long22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Da family: Long - given: Zheng family: Wang - given: Aditi family: Krishnapriyan - given: Robert family: Kirby - given: Shandian family: Zhe - given: Michael family: Mahoney editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14210-14222 id: long22a issued: date-parts: - 2022 - 6 - 28 firstpage: 14210 lastpage: 14222 published: 2022-06-28 00:00:00 +0000 - title: 'Bayesian Model Selection, the Marginal Likelihood, and Generalization' abstract: 'How do we compare between hypotheses that are entirely consistent with observations? The marginal likelihood (aka Bayesian evidence), which represents the probability of generating our observations from a prior, provides a distinctive approach to this foundational question, automatically encoding Occam’s razor. Although it has been observed that the marginal likelihood can overfit and is sensitive to prior assumptions, its limitations for hyperparameter learning and discrete model comparison have not been thoroughly investigated. We first revisit the appealing properties of the marginal likelihood for learning constraints and hypothesis testing. We then highlight the conceptual and practical issues in using the marginal likelihood as a proxy for generalization. Namely, we show how marginal likelihood can be negatively correlated with generalization, with implications for neural architecture search, and can lead to both underfitting and overfitting in hyperparameter learning. We provide a partial remedy through a conditional marginal likelihood, which we show is more aligned with generalization, and practically valuable for large-scale hyperparameter learning, such as in deep kernel learning.' volume: 162 URL: https://proceedings.mlr.press/v162/lotfi22a.html PDF: https://proceedings.mlr.press/v162/lotfi22a/lotfi22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lotfi22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sanae family: Lotfi - given: Pavel family: Izmailov - given: Gregory family: Benton - given: Micah family: Goldblum - given: Andrew Gordon family: Wilson editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14223-14247 id: lotfi22a issued: date-parts: - 2022 - 6 - 28 firstpage: 14223 lastpage: 14247 published: 2022-06-28 00:00:00 +0000 - title: 'Feature Learning and Signal Propagation in Deep Neural Networks' abstract: 'Recent work by Baratin et al. (2021) sheds light on an intriguing pattern that occurs during the training of deep neural networks: some layers align much more with data compared to other layers (where the alignment is defined as the normalize euclidean product of the tangent features matrix and the data labels matrix). The curve of the alignment as a function of layer index (generally) exhibits a ascent-descent pattern where the maximum is reached for some hidden layer. In this work, we provide the first explanation for this phenomenon. We introduce the Equilibrium Hypothesis which connects this alignment pattern to signal propagation in deep neural networks. Our experiments demonstrate an excellent match with the theoretical predictions.' volume: 162 URL: https://proceedings.mlr.press/v162/lou22a.html PDF: https://proceedings.mlr.press/v162/lou22a/lou22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lou22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yizhang family: Lou - given: Chris E family: Mingard - given: Soufiane family: Hayou editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14248-14282 id: lou22a issued: date-parts: - 2022 - 6 - 28 firstpage: 14248 lastpage: 14282 published: 2022-06-28 00:00:00 +0000 - title: 'Fluctuations, Bias, Variance & Ensemble of Learners: Exact Asymptotics for Convex Losses in High-Dimension' abstract: 'From the sampling of data to the initialisation of parameters, randomness is ubiquitous in modern Machine Learning practice. Understanding the statistical fluctuations engendered by the different sources of randomness in prediction is therefore key to understanding robust generalisation. In this manuscript we develop a quantitative and rigorous theory for the study of fluctuations in an ensemble of generalised linear models trained on different, but correlated, features in high-dimensions. In particular, we provide a complete description of the asymptotic joint distribution of the empirical risk minimiser for generic convex loss and regularisation in the high-dimensional limit. Our result encompasses a rich set of classification and regression tasks, such as the lazy regime of overparametrised neural networks, or equivalently the random features approximation of kernels. While allowing to study directly the mitigating effect of ensembling (or bagging) on the bias-variance decomposition of the test error, our analysis also helps disentangle the contribution of statistical fluctuations, and the singular role played by the interpolation threshold that are at the roots of the “double-descent” phenomenon.' volume: 162 URL: https://proceedings.mlr.press/v162/loureiro22a.html PDF: https://proceedings.mlr.press/v162/loureiro22a/loureiro22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-loureiro22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Bruno family: Loureiro - given: Cedric family: Gerbelot - given: Maria family: Refinetti - given: Gabriele family: Sicuro - given: Florent family: Krzakala editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14283-14314 id: loureiro22a issued: date-parts: - 2022 - 6 - 28 firstpage: 14283 lastpage: 14314 published: 2022-06-28 00:00:00 +0000 - title: 'A Single-Loop Gradient Descent and Perturbed Ascent Algorithm for Nonconvex Functional Constrained Optimization' abstract: 'Nonconvex constrained optimization problems can be used to model a number of machine learning problems, such as multi-class Neyman-Pearson classification and constrained Markov decision processes. However, such kinds of problems are challenging because both the objective and constraints are possibly nonconvex, so it is difficult to balance the reduction of the loss value and reduction of constraint violation. Although there are a few methods that solve this class of problems, all of them are double-loop or triple-loop algorithms, and they require oracles to solve some subproblems up to certain accuracy by tuning multiple hyperparameters at each iteration. In this paper, we propose a novel gradient descent and perturbed ascent (GDPA) algorithm to solve a class of smooth nonconvex inequality constrained problems. The GDPA is a primal-dual algorithm, which only exploits the first-order information of both the objective and constraint functions to update the primal and dual variables in an alternating way. The key feature of the proposed algorithm is that it is a single-loop algorithm, where only two step-sizes need to be tuned. We show that under a mild regularity condition GDPA is able to find Karush-Kuhn-Tucker (KKT) points of nonconvex functional constrained problems with convergence rate guarantees. To the best of our knowledge, it is the first single-loop algorithm that can solve the general nonconvex smooth problems with nonconvex inequality constraints. Numerical results also showcase the superiority of GDPA compared with the best-known algorithms (in terms of both stationarity measure and feasibility of the obtained solutions).' volume: 162 URL: https://proceedings.mlr.press/v162/lu22a.html PDF: https://proceedings.mlr.press/v162/lu22a/lu22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Songtao family: Lu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14315-14357 id: lu22a issued: date-parts: - 2022 - 6 - 28 firstpage: 14315 lastpage: 14357 published: 2022-06-28 00:00:00 +0000 - title: 'Additive Gaussian Processes Revisited' abstract: 'Gaussian Process (GP) models are a class of flexible non-parametric models that have rich representational power. By using a Gaussian process with additive structure, complex responses can be modelled whilst retaining interpretability. Previous work showed that additive Gaussian process models require high-dimensional interaction terms. We propose the orthogonal additive kernel (OAK), which imposes an orthogonality constraint on the additive functions, enabling an identifiable, low-dimensional representation of the functional relationship. We connect the OAK kernel to functional ANOVA decomposition, and show improved convergence rates for sparse computation methods. With only a small number of additive low-dimensional terms, we demonstrate the OAK model achieves similar or better predictive performance compared to black-box models, while retaining interpretability.' volume: 162 URL: https://proceedings.mlr.press/v162/lu22b.html PDF: https://proceedings.mlr.press/v162/lu22b/lu22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lu22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xiaoyu family: Lu - given: Alexis family: Boukouvalas - given: James family: Hensman editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14358-14383 id: lu22b issued: date-parts: - 2022 - 6 - 28 firstpage: 14358 lastpage: 14383 published: 2022-06-28 00:00:00 +0000 - title: 'ModLaNets: Learning Generalisable Dynamics via Modularity and Physical Inductive Bias' abstract: 'Deep learning models are able to approximate one specific dynamical system but struggle at learning generalisable dynamics, where dynamical systems obey the same laws of physics but contain different numbers of elements (e.g., double- and triple-pendulum systems). To relieve this issue, we proposed the Modular Lagrangian Network (ModLaNet), a structural neural network framework with modularity and physical inductive bias. This framework models the energy of each element using modularity and then construct the target dynamical system via Lagrangian mechanics. Modularity is beneficial for reusing trained networks and reducing the scale of networks and datasets. As a result, our framework can learn from the dynamics of simpler systems and extend to more complex ones, which is not feasible using other relevant physics-informed neural networks. We examine our framework for modelling double-pendulum or three-body systems with small training datasets, where our models achieve the best data efficiency and accuracy performance compared with counterparts. We also reorganise our models as extensions to model multi-pendulum and multi-body systems, demonstrating the intriguing reusable feature of our framework.' volume: 162 URL: https://proceedings.mlr.press/v162/lu22c.html PDF: https://proceedings.mlr.press/v162/lu22c/lu22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lu22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yupu family: Lu - given: Shijie family: Lin - given: Guanqi family: Chen - given: Jia family: Pan editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14384-14397 id: lu22c issued: date-parts: - 2022 - 6 - 28 firstpage: 14384 lastpage: 14397 published: 2022-06-28 00:00:00 +0000 - title: 'Model-Free Opponent Shaping' abstract: 'In general-sum games the interaction of self-interested learning agents commonly leads to collectively worst-case outcomes, such as defect-defect in the iterated prisoner’s dilemma (IPD). To overcome this, some methods, such as Learning with Opponent-Learning Awareness (LOLA), directly shape the learning process of their opponents. However, these methods are myopic since only a small number of steps can be anticipated, are asymmetric since they treat other agents as naive learners, and require the use of higher-order derivatives, which are calculated through white-box access to an opponent’s differentiable learning algorithm. To address these issues, we propose Model-Free Opponent Shaping (M-FOS). M-FOS learns in a meta-game in which each meta-step is an episode of the underlying game. The meta-state consists of the policies in the underlying game and the meta-policy produces a new policy to be used in the next episode. M-FOS then uses generic model-free optimisation methods to learn meta-policies that accomplish long-horizon opponent shaping. Empirically, M-FOS near-optimally exploits naive learners and other, more sophisticated algorithms from the literature. For example, to the best of our knowledge, it is the first method to learn the well-known ZD extortion strategy in the IPD. In the same settings, M-FOS leads to socially optimal outcomes under meta-self-play. Finally, we show that M-FOS can be scaled to high-dimensional settings.' volume: 162 URL: https://proceedings.mlr.press/v162/lu22d.html PDF: https://proceedings.mlr.press/v162/lu22d/lu22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lu22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Christopher family: Lu - given: Timon family: Willi - given: Christian A Schroeder family: De Witt - given: Jakob family: Foerster editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14398-14411 id: lu22d issued: date-parts: - 2022 - 6 - 28 firstpage: 14398 lastpage: 14411 published: 2022-06-28 00:00:00 +0000 - title: 'Multi-slots Online Matching with High Entropy' abstract: 'Online matching with diversity and fairness pursuit, a common building block in the recommendation and advertising, can be modeled as constrained convex programming with high entropy. While most existing approaches are based on the “single slot” assumption (i.e., assigning one item per iteration), they cannot be directly applied to cases with multiple slots, e.g., stock-aware top-N recommendation and advertising at multiple places. Particularly, the gradient computation and resource allocation are both challenging under this setting due to the absence of a closed-form solution. To overcome these obstacles, we develop a novel algorithm named Online subGradient descent for Multi-slots Allocation (OG-MA). It uses an efficient pooling algorithm to compute closed-form of the gradient then performs a roulette swapping for allocation, yielding a sub-linear regret with linear cost per iteration. Extensive experiments on synthetic and industrial data sets demonstrate that OG-MA is a fast and promising method for multi-slots online matching.' volume: 162 URL: https://proceedings.mlr.press/v162/lu22e.html PDF: https://proceedings.mlr.press/v162/lu22e/lu22e.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lu22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xingyu family: Lu - given: Qintong family: Wu - given: Wenliang family: Zhong editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14412-14428 id: lu22e issued: date-parts: - 2022 - 6 - 28 firstpage: 14412 lastpage: 14428 published: 2022-06-28 00:00:00 +0000 - title: 'Maximum Likelihood Training for Score-based Diffusion ODEs by High Order Denoising Score Matching' abstract: 'Score-based generative models have excellent performance in terms of generation quality and likelihood. They model the data distribution by matching a parameterized score network with first-order data score functions. The score network can be used to define an ODE (“score-based diffusion ODE”) for exact likelihood evaluation. However, the relationship between the likelihood of the ODE and the score matching objective is unclear. In this work, we prove that matching the first-order score is not sufficient to maximize the likelihood of the ODE, by showing a gap between the maximum likelihood and score matching objectives. To fill up this gap, we show that the negative likelihood of the ODE can be bounded by controlling the first, second, and third-order score matching errors; and we further present a novel high-order denoising score matching method to enable maximum likelihood training of score-based diffusion ODEs. Our algorithm guarantees that the higher-order matching error is bounded by the training error and the lower-order errors. We empirically observe that by high-order score matching, score-based diffusion ODEs achieve better likelihood on both synthetic data and CIFAR-10, while retaining the high generation quality.' volume: 162 URL: https://proceedings.mlr.press/v162/lu22f.html PDF: https://proceedings.mlr.press/v162/lu22f/lu22f.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lu22f.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Cheng family: Lu - given: Kaiwen family: Zheng - given: Fan family: Bao - given: Jianfei family: Chen - given: Chongxuan family: Li - given: Jun family: Zhu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14429-14460 id: lu22f issued: date-parts: - 2022 - 6 - 28 firstpage: 14429 lastpage: 14460 published: 2022-06-28 00:00:00 +0000 - title: 'Orchestra: Unsupervised Federated Learning via Globally Consistent Clustering' abstract: 'Federated learning is generally used in tasks where labels are readily available (e.g., next word prediction). Relaxing this constraint requires design of unsupervised learning techniques that can support desirable properties for federated training: robustness to statistical/systems heterogeneity, scalability with number of participants, and communication efficiency. Prior work on this topic has focused on directly extending centralized self-supervised learning techniques, which are not designed to have the properties listed above. To address this situation, we propose Orchestra, a novel unsupervised federated learning technique that exploits the federation’s hierarchy to orchestrate a distributed clustering task and enforce a globally consistent partitioning of clients’ data into discriminable clusters. We show the algorithmic pipeline in Orchestra guarantees good generalization performance under a linear probe, allowing it to outperform alternative techniques in a broad range of conditions, including variation in heterogeneity, number of clients, participation ratio, and local epochs.' volume: 162 URL: https://proceedings.mlr.press/v162/lubana22a.html PDF: https://proceedings.mlr.press/v162/lubana22a/lubana22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lubana22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ekdeep family: Lubana - given: Chi Ian family: Tang - given: Fahim family: Kawsar - given: Robert family: Dick - given: Akhil family: Mathur editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14461-14484 id: lubana22a issued: date-parts: - 2022 - 6 - 28 firstpage: 14461 lastpage: 14484 published: 2022-06-28 00:00:00 +0000 - title: 'A Rigorous Study of Integrated Gradients Method and Extensions to Internal Neuron Attributions' abstract: 'As deep learning (DL) efficacy grows, concerns for poor model explainability grow also. Attribution methods address the issue of explainability by quantifying the importance of an input feature for a model prediction. Among various methods, Integrated Gradients (IG) sets itself apart by claiming other methods failed to satisfy desirable axioms, while IG and methods like it uniquely satisfy said axioms. This paper comments on fundamental aspects of IG and its applications/extensions: 1) We identify key differences between IG function spaces and the supporting literature’s function spaces which problematize previous claims of IG uniqueness. We show that with the introduction of an additional axiom, non-decreasing positivity, the uniqueness claims can be established. 2) We address the question of input sensitivity by identifying function classes where IG is/is not Lipschitz in the attributed input. 3) We show that axioms for single-baseline methods have analogous properties for methods with probability distribution baselines. 4) We introduce a computationally efficient method of identifying internal neurons that contribute to specified regions of an IG attribution map. Finally, we present experimental results validating this method.' volume: 162 URL: https://proceedings.mlr.press/v162/lundstrom22a.html PDF: https://proceedings.mlr.press/v162/lundstrom22a/lundstrom22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lundstrom22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Daniel D family: Lundstrom - given: Tianjian family: Huang - given: Meisam family: Razaviyayn editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14485-14508 id: lundstrom22a issued: date-parts: - 2022 - 6 - 28 firstpage: 14485 lastpage: 14508 published: 2022-06-28 00:00:00 +0000 - title: 'BAMDT: Bayesian Additive Semi-Multivariate Decision Trees for Nonparametric Regression' abstract: 'Bayesian additive regression trees (BART; Chipman et al., 2010) have gained great popularity as a flexible nonparametric function estimation and modeling tool. Nearly all existing BART models rely on decision tree weak learners with axis-parallel univariate split rules to partition the Euclidean feature space into rectangular regions. In practice, however, many regression problems involve features with multivariate structures (e.g., spatial locations) possibly lying in a manifold, where rectangular partitions may fail to respect irregular intrinsic geometry and boundary constraints of the structured feature space. In this paper, we develop a new class of Bayesian additive multivariate decision tree models that combine univariate split rules for handling possibly high dimensional features without known multivariate structures and novel multivariate split rules for features with multivariate structures in each weak learner. The proposed multivariate split rules are built upon stochastic predictive spanning tree bipartition models on reference knots, which are capable of achieving highly flexible nonlinear decision boundaries on manifold feature spaces while enabling efficient dimension reduction computations. We demonstrate the superior performance of the proposed method using simulation data and a Sacramento housing price data set.' volume: 162 URL: https://proceedings.mlr.press/v162/luo22a.html PDF: https://proceedings.mlr.press/v162/luo22a/luo22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-luo22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhao Tang family: Luo - given: Huiyan family: Sang - given: Bani family: Mallick editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14509-14526 id: luo22a issued: date-parts: - 2022 - 6 - 28 firstpage: 14509 lastpage: 14526 published: 2022-06-28 00:00:00 +0000 - title: 'Disentangled Federated Learning for Tackling Attributes Skew via Invariant Aggregation and Diversity Transferring' abstract: 'Attributes skew hinders the current federated learning (FL) frameworks from consistent optimization directions among the clients, which inevitably leads to performance reduction and unstable convergence. The core problems lie in that: 1) Domain-specific attributes, which are non-causal and only locally valid, are indeliberately mixed into global aggregation. 2) The one-stage optimizations of entangled attributes cannot simultaneously satisfy two conflicting objectives, i.e., generalization and personalization. To cope with these, we proposed disentangled federated learning (DFL) to disentangle the domain-specific and cross-invariant attributes into two complementary branches, which are trained by the proposed alternating local-global optimization independently. Importantly, convergence analysis proves that the FL system can be stably converged even if incomplete client models participate in the global aggregation, which greatly expands the application scope of FL. Extensive experiments verify that DFL facilitates FL with higher performance, better interpretability, and faster convergence rate, compared with SOTA FL methods on both manually synthesized and realistic attributes skew datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/luo22b.html PDF: https://proceedings.mlr.press/v162/luo22b/luo22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-luo22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhengquan family: Luo - given: Yunlong family: Wang - given: Zilei family: Wang - given: Zhenan family: Sun - given: Tieniu family: Tan editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14527-14541 id: luo22b issued: date-parts: - 2022 - 6 - 28 firstpage: 14527 lastpage: 14541 published: 2022-06-28 00:00:00 +0000 - title: 'Channel Importance Matters in Few-Shot Image Classification' abstract: 'Few-Shot Learning (FSL) requires vision models to quickly adapt to brand-new classification tasks with a shift in task distribution. Understanding the difficulties posed by this task distribution shift is central to FSL. In this paper, we show that a simple channel-wise feature transformation may be the key to unraveling this secret from a channel perspective. When facing novel few-shot tasks in the test-time datasets, this transformation can greatly improve the generalization ability of learned image representations, while being agnostic to the choice of datasets and training algorithms. Through an in-depth analysis of this transformation, we find that the difficulty of representation transfer in FSL stems from the severe channel bias problem of image representations: channels may have different importance in different tasks, while convolutional neural networks are likely to be insensitive, or respond incorrectly to such a shift. This points out a core problem of the generalization ability of modern vision systems which needs further attention in the future.' volume: 162 URL: https://proceedings.mlr.press/v162/luo22c.html PDF: https://proceedings.mlr.press/v162/luo22c/luo22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-luo22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xu family: Luo - given: Jing family: Xu - given: Zenglin family: Xu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14542-14559 id: luo22c issued: date-parts: - 2022 - 6 - 28 firstpage: 14542 lastpage: 14559 published: 2022-06-28 00:00:00 +0000 - title: 'Learning Dynamics and Generalization in Deep Reinforcement Learning' abstract: 'Solving a reinforcement learning (RL) problem poses two competing challenges: fitting a potentially discontinuous value function, and generalizing well to new observations. In this paper, we analyze the learning dynamics of temporal difference algorithms to gain novel insight into the tension between these two objectives. We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training, and at the same time induces the second-order effect of discouraging generalization. We corroborate these findings in deep RL agents trained on a range of environments, finding that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly initialized networks and networks trained with policy gradient methods. Finally, we investigate how post-training policy distillation may avoid this pitfall, and show that this approach improves generalization to novel environments in the ProcGen suite and improves robustness to input perturbations.' volume: 162 URL: https://proceedings.mlr.press/v162/lyle22a.html PDF: https://proceedings.mlr.press/v162/lyle22a/lyle22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lyle22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Clare family: Lyle - given: Mark family: Rowland - given: Will family: Dabney - given: Marta family: Kwiatkowska - given: Yarin family: Gal editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14560-14581 id: lyle22a issued: date-parts: - 2022 - 6 - 28 firstpage: 14560 lastpage: 14581 published: 2022-06-28 00:00:00 +0000 - title: 'On Finite-Sample Identifiability of Contrastive Learning-Based Nonlinear Independent Component Analysis' abstract: 'Nonlinear independent component analysis (nICA) aims at recovering statistically independent latent components that are mixed by unknown nonlinear functions. Central to nICA is the identifiability of the latent components, which had been elusive until very recently. Specifically, Hyvärinen et al. have shown that the nonlinearly mixed latent components are identifiable (up to often inconsequential ambiguities) under a generalized contrastive learning (GCL) formulation, given that the latent components are independent conditioned on a certain auxiliary variable. The GCL-based identifiability of nICA is elegant, and establishes interesting connections between nICA and popular unsupervised/self-supervised learning paradigms in representation learning, causal learning, and factor disentanglement. However, existing identifiability analyses of nICA all build upon an unlimited sample assumption and the use of ideal universal function learners—which creates a non-negligible gap between theory and practice. Closing the gap is a nontrivial challenge, as there is a lack of established “textbook” routine for finite sample analysis of such unsupervised problems. This work puts forth a finite-sample identifiability analysis of GCL-based nICA. Our analytical framework judiciously combines the properties of the GCL loss function, statistical generalization analysis, and numerical differentiation. Our framework also takes the learning function’s approximation error into consideration, and reveals an intuitive trade-off between the complexity and expressiveness of the employed function learner. Numerical experiments are used to validate the theorems.' volume: 162 URL: https://proceedings.mlr.press/v162/lyu22a.html PDF: https://proceedings.mlr.press/v162/lyu22a/lyu22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lyu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Qi family: Lyu - given: Xiao family: Fu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14582-14600 id: lyu22a issued: date-parts: - 2022 - 6 - 28 firstpage: 14582 lastpage: 14600 published: 2022-06-28 00:00:00 +0000 - title: 'Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning' abstract: 'Dynamic mechanism design has garnered significant attention from both computer scientists and economists in recent years. By allowing agents to interact with the seller over multiple rounds, where agents’ reward functions may change with time and are state-dependent, the framework is able to model a rich class of real-world problems. In these works, the interaction between agents and sellers is often assumed to follow a Markov Decision Process (MDP). We focus on the setting where the reward and transition functions of such an MDP are not known a priori, and we are attempting to recover the optimal mechanism using an a priori collected data set. In the setting where the function approximation is employed to handle large state spaces, with only mild assumptions on the expressiveness of the function class, we are able to design a dynamic mechanism using offline reinforcement learning algorithms. Moreover, learned mechanisms approximately have three key desiderata: efficiency, individual rationality, and truthfulness. Our algorithm is based on the pessimism principle and only requires a mild assumption on the coverage of the offline data set. To the best of our knowledge, our work provides the first offline RL algorithm for dynamic mechanism design without assuming uniform coverage.' volume: 162 URL: https://proceedings.mlr.press/v162/lyu22b.html PDF: https://proceedings.mlr.press/v162/lyu22b/lyu22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-lyu22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Boxiang family: Lyu - given: Zhaoran family: Wang - given: Mladen family: Kolar - given: Zhuoran family: Yang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14601-14638 id: lyu22b issued: date-parts: - 2022 - 6 - 28 firstpage: 14601 lastpage: 14638 published: 2022-06-28 00:00:00 +0000 - title: 'Versatile Offline Imitation from Observations and Examples via Regularized State-Occupancy Matching' abstract: 'We propose State Matching Offline DIstribution Correction Estimation (SMODICE), a novel and versatile regression-based offline imitation learning algorithm derived via state-occupancy matching. We show that the SMODICE objective admits a simple optimization procedure through an application of Fenchel duality and an analytic solution in tabular MDPs. Without requiring access to expert actions, SMODICE can be effectively applied to three offline IL settings: (i) imitation from observations (IfO), (ii) IfO with dynamics or morphologically mismatched expert, and (iii) example-based reinforcement learning, which we show can be formulated as a state-occupancy matching problem. We extensively evaluate SMODICE on both gridworld environments as well as on high-dimensional offline benchmarks. Our results demonstrate that SMODICE is effective for all three problem settings and significantly outperforms prior state-of-art.' volume: 162 URL: https://proceedings.mlr.press/v162/ma22a.html PDF: https://proceedings.mlr.press/v162/ma22a/ma22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ma22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yecheng family: Ma - given: Andrew family: Shen - given: Dinesh family: Jayaraman - given: Osbert family: Bastani editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14639-14663 id: ma22a issued: date-parts: - 2022 - 6 - 28 firstpage: 14639 lastpage: 14663 published: 2022-06-28 00:00:00 +0000 - title: 'Quantification and Analysis of Layer-wise and Pixel-wise Information Discarding' abstract: 'This paper presents a method to explain how the information of each input variable is gradually discarded during the forward propagation in a deep neural network (DNN), which provides new perspectives to explain DNNs. We define two types of entropy-based metrics, i.e. (1) the discarding of pixel-wise information used in the forward propagation, and (2) the uncertainty of the input reconstruction, to measure input information contained by a specific layer from two perspectives. Unlike previous attribution metrics, the proposed metrics ensure the fairness of comparisons between different layers of different DNNs. We can use these metrics to analyze the efficiency of information processing in DNNs, which exhibits strong connections to the performance of DNNs. We analyze information discarding in a pixel-wise manner, which is different from the information bottleneck theory measuring feature information w.r.t. the sample distribution. Experiments have shown the effectiveness of our metrics in analyzing classic DNNs and explaining existing deep-learning techniques. The code is available at https://github.com/haotianSustc/deepinfo.' volume: 162 URL: https://proceedings.mlr.press/v162/ma22b.html PDF: https://proceedings.mlr.press/v162/ma22b/ma22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ma22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Haotian family: Ma - given: Hao family: Zhang - given: Fan family: Zhou - given: Yinqing family: Zhang - given: Quanshi family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14664-14698 id: ma22b issued: date-parts: - 2022 - 6 - 28 firstpage: 14664 lastpage: 14698 published: 2022-06-28 00:00:00 +0000 - title: 'Interpretable Neural Networks with Frank-Wolfe: Sparse Relevance Maps and Relevance Orderings' abstract: 'We study the effects of constrained optimization formulations and Frank-Wolfe algorithms for obtaining interpretable neural network predictions. Reformulating the Rate-Distortion Explanations (RDE) method for relevance attribution as a constrained optimization problem provides precise control over the sparsity of relevance maps. This enables a novel multi-rate as well as a relevance-ordering variant of RDE that both empirically outperform standard RDE and other baseline methods in a well-established comparison test. We showcase several deterministic and stochastic variants of the Frank-Wolfe algorithm and their effectiveness for RDE.' volume: 162 URL: https://proceedings.mlr.press/v162/macdonald22a.html PDF: https://proceedings.mlr.press/v162/macdonald22a/macdonald22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-macdonald22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jan family: Macdonald - given: Mathieu E. family: Besançon - given: Sebastian family: Pokutta editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14699-14716 id: macdonald22a issued: date-parts: - 2022 - 6 - 28 firstpage: 14699 lastpage: 14716 published: 2022-06-28 00:00:00 +0000 - title: 'A Tighter Analysis of Spectral Clustering, and Beyond' abstract: 'This work studies the classical spectral clustering algorithm which embeds the vertices of some graph G=(V_G, E_G) into R^k using k eigenvectors of some matrix of G, and applies k-means to partition V_G into k clusters. Our first result is a tighter analysis on the performance of spectral clustering, and explains why it works under some much weaker condition than the ones studied in the literature. For the second result, we show that, by applying fewer than k eigenvectors to construct the embedding, spectral clustering is able to produce better output for many practical instances; this result is the first of its kind in spectral clustering. Besides its conceptual and theoretical significance, the practical impact of our work is demonstrated by the empirical analysis on both synthetic and real-world data sets, in which spectral clustering produces comparable or better results with fewer than k eigenvectors.' volume: 162 URL: https://proceedings.mlr.press/v162/macgregor22a.html PDF: https://proceedings.mlr.press/v162/macgregor22a/macgregor22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-macgregor22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Peter family: Macgregor - given: He family: Sun editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14717-14742 id: macgregor22a issued: date-parts: - 2022 - 6 - 28 firstpage: 14717 lastpage: 14742 published: 2022-06-28 00:00:00 +0000 - title: 'Zero-Shot Reward Specification via Grounded Natural Language' abstract: 'Reward signals in reinforcement learning are expensive to design and often require access to the true state which is not available in the real world. Common alternatives are usually demonstrations or goal images which can be labor-intensive to collect. On the other hand, text descriptions provide a general, natural, and low-effort way of communicating the desired task. However, prior works in learning text-conditioned policies still rely on rewards that are defined using either true state or labeled expert demonstrations. We use recent developments in building large-scale visuolanguage models like CLIP to devise a framework that generates the task reward signal just from goal text description and raw pixel observations which is then used to learn the task policy. We evaluate the proposed framework on control and robotic manipulation tasks. Finally, we distill the individual task policies into a single goal text conditioned policy that can generalize in a zero-shot manner to new tasks with unseen objects and unseen goal text descriptions.' volume: 162 URL: https://proceedings.mlr.press/v162/mahmoudieh22a.html PDF: https://proceedings.mlr.press/v162/mahmoudieh22a/mahmoudieh22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-mahmoudieh22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Parsa family: Mahmoudieh - given: Deepak family: Pathak - given: Trevor family: Darrell editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14743-14752 id: mahmoudieh22a issued: date-parts: - 2022 - 6 - 28 firstpage: 14743 lastpage: 14752 published: 2022-06-28 00:00:00 +0000 - title: 'Feature selection using e-values' abstract: 'In the context of supervised learning, we introduce the concept of e-value. An e-value is a scalar quantity that represents the proximity of the sampling distribution of parameter estimates in a model trained on a subset of features to that of the model trained on all features (i.e. the full model). Under general conditions, a rank ordering of e-values separates models that contain all essential features from those that do not. For a p-dimensional feature space, this requires fitting only the full model and evaluating p+1 models, as opposed to the traditional requirement of fitting and evaluating 2^p models. The above e-values framework is applicable to a wide range of parametric models. We use data depths and a fast resampling-based algorithm to implement a feature selection procedure, providing consistency results. Through experiments across several model settings and synthetic and real datasets, we establish that the e-values can be a promising general alternative to existing model-specific methods of feature selection.' volume: 162 URL: https://proceedings.mlr.press/v162/majumdar22a.html PDF: https://proceedings.mlr.press/v162/majumdar22a/majumdar22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-majumdar22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Subhabrata family: Majumdar - given: Snigdhansu family: Chatterjee editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14753-14773 id: majumdar22a issued: date-parts: - 2022 - 6 - 28 firstpage: 14753 lastpage: 14773 published: 2022-06-28 00:00:00 +0000 - title: 'Knowledge-Grounded Self-Rationalization via Extractive and Natural Language Explanations' abstract: 'Models that generate extractive rationales (i.e., subsets of features) or natural language explanations (NLEs) for their predictions are important for explainable AI. While an extractive rationale provides a quick view of the features most responsible for a prediction, an NLE allows for a comprehensive description of the decision-making process behind a prediction. However, current models that generate the best extractive rationales or NLEs often fall behind the state-of-the-art (SOTA) in terms of task performance. In this work, we bridge this gap by introducing RExC, a self-rationalizing framework that grounds its predictions and two complementary types of explanations (NLEs and extractive rationales) in background knowledge. Our framework improves over previous methods by: (i) reaching SOTA task performance while also providing explanations, (ii) providing two types of explanations, while existing models usually provide only one type, and (iii) beating by a large margin the previous SOTA in terms of quality of both types of explanations. Furthermore, a perturbation analysis in RExC shows a high degree of association between explanations and predictions, a necessary property of faithful explanations.' volume: 162 URL: https://proceedings.mlr.press/v162/majumder22a.html PDF: https://proceedings.mlr.press/v162/majumder22a/majumder22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-majumder22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Bodhisattwa Prasad family: Majumder - given: Oana family: Camburu - given: Thomas family: Lukasiewicz - given: Julian family: Mcauley editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14786-14801 id: majumder22a issued: date-parts: - 2022 - 6 - 28 firstpage: 14786 lastpage: 14801 published: 2022-06-28 00:00:00 +0000 - title: 'Nonparametric Involutive Markov Chain Monte Carlo' abstract: 'A challenging problem in probabilistic programming is to develop inference algorithms that work for arbitrary programs in a universal probabilistic programming language (PPL). We present the nonparametric involutive Markov chain Monte Carlo (NP-iMCMC) algorithm as a method for constructing MCMC inference algorithms for nonparametric models expressible in universal PPLs. Building on the unifying involutive MCMC framework, and by providing a general procedure for driving state movement between dimensions, we show that NP-iMCMC can generalise numerous existing iMCMC algorithms to work on nonparametric models. We prove the correctness of the NP-iMCMC sampler. Our empirical study shows that the existing strengths of several iMCMC algorithms carry over to their nonparametric extensions. Applying our method to the recently proposed Nonparametric HMC, an instance of (Multiple Step) NP-iMCMC, we have constructed several nonparametric extensions (all of which new) that exhibit significant performance improvements.' volume: 162 URL: https://proceedings.mlr.press/v162/mak22a.html PDF: https://proceedings.mlr.press/v162/mak22a/mak22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-mak22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Carol family: Mak - given: Fabian family: Zaiser - given: Luke family: Ong editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14802-14859 id: mak22a issued: date-parts: - 2022 - 6 - 28 firstpage: 14802 lastpage: 14859 published: 2022-06-28 00:00:00 +0000 - title: 'Architecture Agnostic Federated Learning for Neural Networks' abstract: 'With growing concerns regarding data privacy and rapid increase in data volume, Federated Learning (FL) has become an important learning paradigm. However, jointly learning a deep neural network model in a FL setting proves to be a non-trivial task because of the complexities associated with the neural networks, such as varied architectures across clients, permutation invariance of the neurons, and presence of non-linear transformations in each layer. This work introduces a novel framework, Federated Heterogeneous Neural Networks (FedHeNN), that allows each client to build a personalised model without enforcing a common architecture across clients. This allows each client to optimize with respect to local data and compute constraints, while still benefiting from the learnings of other (potentially more powerful) clients. The key idea of FedHeNN is to use the instance-level representations obtained from peer clients to guide the simultaneous training on each client. The extensive experimental results demonstrate that the FedHeNN framework is capable of learning better performing models on clients in both the settings of homogeneous and heterogeneous architectures across clients.' volume: 162 URL: https://proceedings.mlr.press/v162/makhija22a.html PDF: https://proceedings.mlr.press/v162/makhija22a/makhija22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-makhija22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Disha family: Makhija - given: Xing family: Han - given: Nhat family: Ho - given: Joydeep family: Ghosh editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14860-14870 id: makhija22a issued: date-parts: - 2022 - 6 - 28 firstpage: 14860 lastpage: 14870 published: 2022-06-28 00:00:00 +0000 - title: 'Robustness in Multi-Objective Submodular Optimization: a Quantile Approach' abstract: 'The optimization of multi-objective submodular systems appears in a wide variety of applications. However, there are currently very few techniques which are able to provide a robust allocation to such systems. In this work, we propose to design and analyse novel algorithms for the robust allocation of submodular systems through lens of quantile maximization. We start by observing that identifying an exact solution for this problem is computationally intractable. To tackle this issue, we propose a proxy for the quantile function using a softmax formulation, and show that this proxy is well suited to submodular optimization. Based on this relaxation, we propose a novel and simple algorithm called SOFTSAT. Theoretical properties are provided for this algorithm as well as novel approximation guarantees. Finally, we provide numerical experiments showing the efficiency of our algorithm with regards to state-of-the-art methods in a test bed of real-world applications, and show that SOFTSAT is particularly robust and well-suited to online scenarios.' volume: 162 URL: https://proceedings.mlr.press/v162/malherbe22a.html PDF: https://proceedings.mlr.press/v162/malherbe22a/malherbe22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-malherbe22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Cedric family: Malherbe - given: Kevin family: Scaman editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14871-14886 id: malherbe22a issued: date-parts: - 2022 - 6 - 28 firstpage: 14871 lastpage: 14886 published: 2022-06-28 00:00:00 +0000 - title: 'More Efficient Sampling for Tensor Decomposition With Worst-Case Guarantees' abstract: 'Recent papers have developed alternating least squares (ALS) methods for CP and tensor ring decomposition with a per-iteration cost which is sublinear in the number of input tensor entries for low-rank decomposition. However, the per-iteration cost of these methods still has an exponential dependence on the number of tensor modes when parameters are chosen to achieve certain worst-case guarantees. In this paper, we propose sampling-based ALS methods for the CP and tensor ring decompositions whose cost does not have this exponential dependence, thereby significantly improving on the previous state-of-the-art. We provide a detailed theoretical analysis and also apply the methods in a feature extraction experiment.' volume: 162 URL: https://proceedings.mlr.press/v162/malik22a.html PDF: https://proceedings.mlr.press/v162/malik22a/malik22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-malik22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Osman Asif family: Malik editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14887-14917 id: malik22a issued: date-parts: - 2022 - 6 - 28 firstpage: 14887 lastpage: 14917 published: 2022-06-28 00:00:00 +0000 - title: 'Unaligned Supervision for Automatic Music Transcription in The Wild' abstract: 'Multi-instrument Automatic Music Transcription (AMT), or the decoding of a musical recording into semantic musical content, is one of the holy grails of Music Information Retrieval. Current AMT approaches are restricted to piano and (some) guitar recordings, due to difficult data collection. In order to overcome data collection barriers, previous AMT approaches attempt to employ musical scores in the form of a digitized version of the same song or piece. The scores are typically aligned using audio features and strenuous human intervention to generate training labels. We introduce Note$_{EM}$, a method for simultaneously training a transcriber and aligning the scores to their corresponding performances, in a fully-automated process. Using this unaligned supervision scheme, complemented by pseudo-labels and pitch shift augmentation, our method enables training on in-the-wild recordings with unprecedented accuracy and instrumental variety. Using only synthetic data and unaligned supervision, we report SOTA note-level accuracy of the MAPS dataset, and large favorable margins on cross-dataset evaluations. We also demonstrate robustness and ease of use; we report comparable results when training on a small, easily obtainable, self-collected dataset, and we propose alternative labeling to the MusicNet dataset, which we show to be more accurate. Our project page is available at https://benadar293.github.io.' volume: 162 URL: https://proceedings.mlr.press/v162/maman22a.html PDF: https://proceedings.mlr.press/v162/maman22a/maman22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-maman22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ben family: Maman - given: Amit H family: Bermano editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14918-14934 id: maman22a issued: date-parts: - 2022 - 6 - 28 firstpage: 14918 lastpage: 14934 published: 2022-06-28 00:00:00 +0000 - title: 'Decision-Focused Learning: Through the Lens of Learning to Rank' abstract: 'In the last years decision-focused learning framework, also known as predict-and-optimize, have received increasing attention. In this setting, the predictions of a machine learning model are used as estimated cost coefficients in the objective function of a discrete combinatorial optimization problem for decision making. Decision-focused learning proposes to train the ML models, often neural network models, by directly optimizing the quality of decisions made by the optimization solvers. Based on a recent work that proposed a noise contrastive estimation loss over a subset of the solution space, we observe that decision-focused learning can more generally be seen as a learning-to-rank problem, where the goal is to learn an objective function that ranks the feasible points correctly. This observation is independent of the optimization method used and of the form of the objective function. We develop pointwise, pairwise and listwise ranking loss functions, which can be differentiated in closed form given a subset of solutions. We empirically investigate the quality of our generic methods compared to existing decision-focused learning approaches with competitive results. Furthermore, controlling the subset of solutions allows controlling the runtime considerably, with limited effect on regret.' volume: 162 URL: https://proceedings.mlr.press/v162/mandi22a.html PDF: https://proceedings.mlr.press/v162/mandi22a/mandi22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-mandi22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jayanta family: Mandi - given: Vı́ctor family: Bucarey - given: Maxime Mulamba Ke family: Tchomba - given: Tias family: Guns editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14935-14947 id: mandi22a issued: date-parts: - 2022 - 6 - 28 firstpage: 14935 lastpage: 14947 published: 2022-06-28 00:00:00 +0000 - title: 'Differentially Private Coordinate Descent for Composite Empirical Risk Minimization' abstract: 'Machine learning models can leak information about the data used to train them. To mitigate this issue, Differentially Private (DP) variants of optimization algorithms like Stochastic Gradient Descent (DP-SGD) have been designed to trade-off utility for privacy in Empirical Risk Minimization (ERM) problems. In this paper, we propose Differentially Private proximal Coordinate Descent (DP-CD), a new method to solve composite DP-ERM problems. We derive utility guarantees through a novel theoretical analysis of inexact coordinate descent. Our results show that, thanks to larger step sizes, DP-CD can exploit imbalance in gradient coordinates to outperform DP-SGD. We also prove new lower bounds for composite DP-ERM under coordinate-wise regularity assumptions, that are nearly matched by DP-CD. For practical implementations, we propose to clip gradients using coordinate-wise thresholds that emerge from our theory, avoiding costly hyperparameter tuning. Experiments on real and synthetic data support our results, and show that DP-CD compares favorably with DP-SGD.' volume: 162 URL: https://proceedings.mlr.press/v162/mangold22a.html PDF: https://proceedings.mlr.press/v162/mangold22a/mangold22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-mangold22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Paul family: Mangold - given: Aurélien family: Bellet - given: Joseph family: Salmon - given: Marc family: Tommasi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14948-14978 id: mangold22a issued: date-parts: - 2022 - 6 - 28 firstpage: 14948 lastpage: 14978 published: 2022-06-28 00:00:00 +0000 - title: 'Refined Convergence Rates for Maximum Likelihood Estimation under Finite Mixture Models' abstract: 'We revisit the classical problem of deriving convergence rates for the maximum likelihood estimator (MLE) in finite mixture models. The Wasserstein distance has become a standard loss function for the analysis of parameter estimation in these models, due in part to its ability to circumvent label switching and to accurately characterize the behaviour of fitted mixture components with vanishing weights. However, the Wasserstein distance is only able to capture the worst-case convergence rate among the remaining fitted mixture components. We demonstrate that when the log-likelihood function is penalized to discourage vanishing mixing weights, stronger loss functions can be derived to resolve this shortcoming of the Wasserstein distance. These new loss functions accurately capture the heterogeneity in convergence rates of fitted mixture components, and we use them to sharpen existing pointwise and uniform convergence rates in various classes of mixture models. In particular, these results imply that a subset of the components of the penalized MLE typically converge significantly faster than could have been anticipated from past work. We further show that some of these conclusions extend to the traditional MLE. Our theoretical findings are supported by a simulation study to illustrate these improved convergence rates.' volume: 162 URL: https://proceedings.mlr.press/v162/manole22a.html PDF: https://proceedings.mlr.press/v162/manole22a/manole22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-manole22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tudor family: Manole - given: Nhat family: Ho editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 14979-15006 id: manole22a issued: date-parts: - 2022 - 6 - 28 firstpage: 14979 lastpage: 15006 published: 2022-06-28 00:00:00 +0000 - title: 'On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning' abstract: 'Multi-agent reinforcement learning (MARL) algorithms often suffer from an exponential sample complexity dependence on the number of agents, a phenomenon known as the curse of multiagents. We address this challenge by investigating sample-efficient model-free algorithms in decentralized MARL, and aim to improve existing algorithms along this line. For learning (coarse) correlated equilibria in general-sum Markov games, we propose stage-based V-learning algorithms that significantly simplify the algorithmic design and analysis of recent works, and circumvent a rather complicated no-weighted-regret bandit subroutine. For learning Nash equilibria in Markov potential games, we propose an independent policy gradient algorithm with a decentralized momentum-based variance reduction technique. All our algorithms are decentralized in that each agent can make decisions based on only its local information. Neither communication nor centralized coordination is required during learning, leading to a natural generalization to a large number of agents. Finally, we provide numerical simulations to corroborate our theoretical findings.' volume: 162 URL: https://proceedings.mlr.press/v162/mao22a.html PDF: https://proceedings.mlr.press/v162/mao22a/mao22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-mao22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Weichao family: Mao - given: Lin family: Yang - given: Kaiqing family: Zhang - given: Tamer family: Basar editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15007-15049 id: mao22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15007 lastpage: 15049 published: 2022-06-28 00:00:00 +0000 - title: 'On the Effects of Artificial Data Modification' abstract: 'Data distortion is commonly applied in vision models during both training (e.g methods like MixUp and CutMix) and evaluation (e.g. shape-texture bias and robustness). This data modification can introduce artificial information. It is often assumed that the resulting artefacts are detrimental to training, whilst being negligible when analysing models. We investigate these assumptions and conclude that in some cases they are unfounded and lead to incorrect results. Specifically, we show current shape bias identification methods and occlusion robustness measures are biased and propose a fairer alternative for the latter. Subsequently, through a series of experiments we seek to correct and strengthen the community’s perception of how augmenting affects learning of vision models. Based on our empirical results we argue that the impact of the artefacts must be understood and exploited rather than eliminated.' volume: 162 URL: https://proceedings.mlr.press/v162/marcu22a.html PDF: https://proceedings.mlr.press/v162/marcu22a/marcu22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-marcu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Antonia family: Marcu - given: Adam family: Prugel-Bennett editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15050-15069 id: marcu22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15050 lastpage: 15069 published: 2022-06-28 00:00:00 +0000 - title: 'Personalized Federated Learning through Local Memorization' abstract: 'Federated learning allows clients to collaboratively learn statistical models while keeping their data local. Federated learning was originally used to train a unique global model to be served to all clients, but this approach might be sub-optimal when clients’ local data distributions are heterogeneous. In order to tackle this limitation, recent personalized federated learning methods train a separate model for each client while still leveraging the knowledge available at other clients. In this work, we exploit the ability of deep neural networks to extract high quality vectorial representations (embeddings) from non-tabular data, e.g., images and text, to propose a personalization mechanism based on local memorization. Personalization is obtained by interpolating a collectively trained global model with a local $k$-nearest neighbors (kNN) model based on the shared representation provided by the global model. We provide generalization bounds for the proposed approach in the case of binary classification, and we show on a suite of federated datasets that this approach achieves significantly higher accuracy and fairness than state-of-the-art methods.' volume: 162 URL: https://proceedings.mlr.press/v162/marfoq22a.html PDF: https://proceedings.mlr.press/v162/marfoq22a/marfoq22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-marfoq22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Othmane family: Marfoq - given: Giovanni family: Neglia - given: Richard family: Vidal - given: Laetitia family: Kameni editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15070-15092 id: marfoq22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15070 lastpage: 15092 published: 2022-06-28 00:00:00 +0000 - title: 'Nested Bandits' abstract: 'In many online decision processes, the optimizing agent is called to choose between large numbers of alternatives with many inherent similarities; in turn, these similarities imply closely correlated losses that may confound standard discrete choice models and bandit algorithms. We study this question in the context of nested bandits, a class of adversarial multi-armed bandit problems where the learner seeks to minimize their regret in the presence of a large number of distinct alternatives with a hierarchy of embedded (non-combinatorial) similarities. In this setting, optimal algorithms based on the exponential weights blueprint (like Hedge, EXP3, and their variants) may incur significant regret because they tend to spend excessive amounts of time exploring irrelevant alternatives with similar, suboptimal costs. To account for this, we propose a nested exponential weights (NEW) algorithm that performs a layered exploration of the learner’s set of alternatives based on a nested, step-by-step selection method. In so doing, we obtain a series of tight bounds for the learner’s regret showing that online learning problems with a high degree of similarity between alternatives can be resolved efficiently, without a red bus / blue bus paradox occurring.' volume: 162 URL: https://proceedings.mlr.press/v162/martin22a.html PDF: https://proceedings.mlr.press/v162/martin22a/martin22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-martin22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Matthieu family: Martin - given: Panayotis family: Mertikopoulos - given: Thibaud family: Rahier - given: Houssam family: Zenati editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15093-15121 id: martin22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15093 lastpage: 15121 published: 2022-06-28 00:00:00 +0000 - title: 'Closed-Form Diffeomorphic Transformations for Time Series Alignment' abstract: 'Time series alignment methods call for highly expressive, differentiable and invertible warping functions which preserve temporal topology, i.e diffeomorphisms. Diffeomorphic warping functions can be generated from the integration of velocity fields governed by an ordinary differential equation (ODE). Gradient-based optimization frameworks containing diffeomorphic transformations require to calculate derivatives to the differential equation’s solution with respect to the model parameters, i.e. sensitivity analysis. Unfortunately, deep learning frameworks typically lack automatic-differentiation-compatible sensitivity analysis methods; and implicit functions, such as the solution of ODE, require particular care. Current solutions appeal to adjoint sensitivity methods, ad-hoc numerical solvers or ResNet’s Eulerian discretization. In this work, we present a closed-form expression for the ODE solution and its gradient under continuous piecewise-affine (CPA) velocity functions. We present a highly optimized implementation of the results on CPU and GPU. Furthermore, we conduct extensive experiments on several datasets to validate the generalization ability of our model to unseen data for time-series joint alignment. Results show significant improvements both in terms of efficiency and accuracy.' volume: 162 URL: https://proceedings.mlr.press/v162/martinez22a.html PDF: https://proceedings.mlr.press/v162/martinez22a/martinez22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-martinez22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Iñigo family: Martinez - given: Elisabeth family: Viles - given: Igor G. family: Olaizola editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15122-15158 id: martinez22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15122 lastpage: 15158 published: 2022-06-28 00:00:00 +0000 - title: 'SPECTRE: Spectral Conditioning Helps to Overcome the Expressivity Limits of One-shot Graph Generators' abstract: 'We approach the graph generation problem from a spectral perspective by first generating the dominant parts of the graph Laplacian spectrum and then building a graph matching these eigenvalues and eigenvectors. Spectral conditioning allows for direct modeling of the global and local graph structure and helps to overcome the expressivity and mode collapse issues of one-shot graph generators. Our novel GAN, called SPECTRE, enables the one-shot generation of much larger graphs than previously possible with one-shot models. SPECTRE outperforms state-of-the-art deep autoregressive generators in terms of modeling fidelity, while also avoiding expensive sequential generation and dependence on node ordering. A case in point, in sizable synthetic and real-world graphs SPECTRE achieves a 4-to-170 fold improvement over the best competitor that does not overfit and is 23-to-30 times faster than autoregressive generators.' volume: 162 URL: https://proceedings.mlr.press/v162/martinkus22a.html PDF: https://proceedings.mlr.press/v162/martinkus22a/martinkus22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-martinkus22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Karolis family: Martinkus - given: Andreas family: Loukas - given: Nathanaël family: Perraudin - given: Roger family: Wattenhofer editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15159-15179 id: martinkus22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15159 lastpage: 15179 published: 2022-06-28 00:00:00 +0000 - title: 'Modular Conformal Calibration' abstract: 'Uncertainty estimates must be calibrated (i.e., accurate) and sharp (i.e., informative) in order to be useful. This has motivated a variety of methods for recalibration, which use held-out data to turn an uncalibrated model into a calibrated model. However, the applicability of existing methods is limited due to their assumption that the original model is also a probabilistic model. We introduce a versatile class of algorithms for recalibration in regression that we call modular conformal calibration (MCC). This framework allows one to transform any regression model into a calibrated probabilistic model. The modular design of MCC allows us to make simple adjustments to existing algorithms that enable well-behaved distribution predictions. We also provide finite-sample calibration guarantees for MCC algorithms. Our framework recovers isotonic recalibration, conformal calibration, and conformal interval prediction, implying that our theoretical results apply to those methods as well. Finally, we conduct an empirical study of MCC on 17 regression datasets. Our results show that new algorithms designed in our framework achieve near-perfect calibration and improve sharpness relative to existing methods.' volume: 162 URL: https://proceedings.mlr.press/v162/marx22a.html PDF: https://proceedings.mlr.press/v162/marx22a/marx22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-marx22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Charles family: Marx - given: Shengjia family: Zhao - given: Willie family: Neiswanger - given: Stefano family: Ermon editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15180-15195 id: marx22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15180 lastpage: 15195 published: 2022-06-28 00:00:00 +0000 - title: 'Continual Repeated Annealed Flow Transport Monte Carlo' abstract: 'We propose Continual Repeated Annealed Flow Transport Monte Carlo (CRAFT), a method that combines a sequential Monte Carlo (SMC) sampler (itself a generalization of Annealed Importance Sampling) with variational inference using normalizing flows. The normalizing flows are directly trained to transport between annealing temperatures using a KL divergence for each transition. This optimization objective is itself estimated using the normalizing flow/SMC approximation. We show conceptually and using multiple empirical examples that CRAFT improves on Annealed Flow Transport Monte Carlo (Arbel et al., 2021), on which it builds and also on Markov chain Monte Carlo (MCMC) based Stochastic Normalizing Flows (Wu et al., 2020). By incorporating CRAFT within particle MCMC, we show that such learnt samplers can achieve impressively accurate results on a challenging lattice field theory example.' volume: 162 URL: https://proceedings.mlr.press/v162/matthews22a.html PDF: https://proceedings.mlr.press/v162/matthews22a/matthews22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-matthews22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Alex family: Matthews - given: Michael family: Arbel - given: Danilo Jimenez family: Rezende - given: Arnaud family: Doucet editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15196-15219 id: matthews22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15196 lastpage: 15219 published: 2022-06-28 00:00:00 +0000 - title: 'How to Stay Curious while avoiding Noisy TVs using Aleatoric Uncertainty Estimation' abstract: 'When extrinsic rewards are sparse, artificial agents struggle to explore an environment. Curiosity, implemented as an intrinsic reward for prediction errors, can improve exploration but it is known to fail when faced with action-dependent noise sources (‘noisy TVs’). In an attempt to make exploring agents robust to Noisy TVs, we present a simple solution: aleatoric mapping agents (AMAs). AMAs are a novel form of curiosity that explicitly ascertain which state transitions of the environment are unpredictable, even if those dynamics are induced by the actions of the agent. This is achieved by generating separate forward predictions for the mean and aleatoric uncertainty of future states, with the aim of reducing intrinsic rewards for those transitions that are unpredictable. We demonstrate that in a range of environments AMAs are able to circumvent action-dependent stochastic traps that immobilise conventional curiosity driven agents. Furthermore, we demonstrate empirically that other common exploration approaches—previously thought to be immune to agent-induced randomness—can be trapped by stochastic dynamics.' volume: 162 URL: https://proceedings.mlr.press/v162/mavor-parker22a.html PDF: https://proceedings.mlr.press/v162/mavor-parker22a/mavor-parker22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-mavor-parker22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Augustine family: Mavor-Parker - given: Kimberly family: Young - given: Caswell family: Barry - given: Lewis family: Griffin editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15220-15240 id: mavor-parker22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15220 lastpage: 15240 published: 2022-06-28 00:00:00 +0000 - title: 'How to Steer Your Adversary: Targeted and Efficient Model Stealing Defenses with Gradient Redirection' abstract: 'Model stealing attacks present a dilemma for public machine learning APIs. To protect financial investments, companies may be forced to withhold important information about their models that could facilitate theft, including uncertainty estimates and prediction explanations. This compromise is harmful not only to users but also to external transparency. Model stealing defenses seek to resolve this dilemma by making models harder to steal while preserving utility for benign users. However, existing defenses have poor performance in practice, either requiring enormous computational overheads or severe utility trade-offs. To meet these challenges, we present a new approach to model stealing defenses called gradient redirection. At the core of our approach is a provably optimal, efficient algorithm for steering an adversary’s training updates in a targeted manner. Combined with improvements to surrogate networks and a novel coordinated defense strategy, our gradient redirection defense, called GRAD^2, achieves small utility trade-offs and low computational overhead, outperforming the best prior defenses. Moreover, we demonstrate how gradient redirection enables reprogramming the adversary with arbitrary behavior, which we hope will foster work on new avenues of defense.' volume: 162 URL: https://proceedings.mlr.press/v162/mazeika22a.html PDF: https://proceedings.mlr.press/v162/mazeika22a/mazeika22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-mazeika22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mantas family: Mazeika - given: Bo family: Li - given: David family: Forsyth editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15241-15254 id: mazeika22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15241 lastpage: 15254 published: 2022-06-28 00:00:00 +0000 - title: 'Quant-BnB: A Scalable Branch-and-Bound Method for Optimal Decision Trees with Continuous Features' abstract: 'Decision trees are one of the most useful and popular methods in the machine learning toolbox. In this paper, we consider the problem of learning optimal decision trees, a combinatorial optimization problem that is challenging to solve at scale. A common approach in the literature is to use greedy heuristics, which may not be optimal. Recently there has been significant interest in learning optimal decision trees using various approaches (e.g., based on integer programming, dynamic programming)—to achieve computational scalability, most of these approaches focus on classification tasks with binary features. In this paper, we present a new discrete optimization method based on branch-and-bound (BnB) to obtain optimal decision trees. Different from existing customized approaches, we consider both regression and classification tasks with continuous features. The basic idea underlying our approach is to split the search space based on the quantiles of the feature distribution—leading to upper and lower bounds for the underlying optimization problem along the BnB iterations. Our proposed algorithm Quant-BnB shows significant speedups compared to existing approaches for shallow optimal trees on various real datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/mazumder22a.html PDF: https://proceedings.mlr.press/v162/mazumder22a/mazumder22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-mazumder22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Rahul family: Mazumder - given: Xiang family: Meng - given: Haoyue family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15255-15277 id: mazumder22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15255 lastpage: 15277 published: 2022-06-28 00:00:00 +0000 - title: 'Optimizing Tensor Network Contraction Using Reinforcement Learning' abstract: 'Quantum Computing (QC) stands to revolutionize computing, but is currently still limited. To develop and test quantum algorithms today, quantum circuits are often simulated on classical computers. Simulating a complex quantum circuit requires computing the contraction of a large network of tensors. The order (path) of contraction can have a drastic effect on the computing cost, but finding an efficient order is a challenging combinatorial optimization problem. We propose a Reinforcement Learning (RL) approach combined with Graph Neural Networks (GNN) to address the contraction ordering problem. The problem is extremely challenging due to the huge search space, the heavy-tailed reward distribution, and the challenging credit assignment. We show how a carefully implemented RL-agent that uses a GNN as the basic policy construct can address these challenges and obtain significant improvements over state-of-the-art techniques in three varieties of circuits, including the largest scale networks used in contemporary QC.' volume: 162 URL: https://proceedings.mlr.press/v162/meirom22a.html PDF: https://proceedings.mlr.press/v162/meirom22a/meirom22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-meirom22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Eli family: Meirom - given: Haggai family: Maron - given: Shie family: Mannor - given: Gal family: Chechik editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15278-15292 id: meirom22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15278 lastpage: 15292 published: 2022-06-28 00:00:00 +0000 - title: 'Causal Transformer for Estimating Counterfactual Outcomes' abstract: 'Estimating counterfactual outcomes over time from observational data is relevant for many applications (e.g., personalized medicine). Yet, state-of-the-art methods build upon simple long short-term memory (LSTM) networks, thus rendering inferences for complex, long-range dependencies challenging. In this paper, we develop a novel Causal Transformer for estimating counterfactual outcomes over time. Our model is specifically designed to capture complex, long-range dependencies among time-varying confounders. For this, we combine three transformer subnetworks with separate inputs for time-varying covariates, previous treatments, and previous outcomes into a joint network with in-between cross-attentions. We further develop a custom, end-to-end training procedure for our Causal Transformer. Specifically, we propose a novel counterfactual domain confusion loss to address confounding bias: it aims to learn adversarial balanced representations, so that they are predictive of the next outcome but non-predictive of the current treatment assignment. We evaluate our Causal Transformer based on synthetic and real-world datasets, where it achieves superior performance over current baselines. To the best of our knowledge, this is the first work proposing transformer-based architecture for estimating counterfactual outcomes from longitudinal data.' volume: 162 URL: https://proceedings.mlr.press/v162/melnychuk22a.html PDF: https://proceedings.mlr.press/v162/melnychuk22a/melnychuk22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-melnychuk22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Valentyn family: Melnychuk - given: Dennis family: Frauen - given: Stefan family: Feuerriegel editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15293-15329 id: melnychuk22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15293 lastpage: 15329 published: 2022-06-28 00:00:00 +0000 - title: 'Steerable 3D Spherical Neurons' abstract: 'Emerging from low-level vision theory, steerable filters found their counterpart in prior work on steerable convolutional neural networks equivariant to rigid transformations. In our work, we propose a steerable feed-forward learning-based approach that consists of neurons with spherical decision surfaces and operates on point clouds. Such spherical neurons are obtained by conformal embedding of Euclidean space and have recently been revisited in the context of learning representations of point sets. Focusing on 3D geometry, we exploit the isometry property of spherical neurons and derive a 3D steerability constraint. After training spherical neurons to classify point clouds in a canonical orientation, we use a tetrahedron basis to quadruplicate the neurons and construct rotation-equivariant spherical filter banks. We then apply the derived constraint to interpolate the filter bank outputs and, thus, obtain a rotation-invariant network. Finally, we use a synthetic point set and real-world 3D skeleton data to verify our theoretical findings. The code is available at https://github.com/pavlo-melnyk/steerable-3d-neurons.' volume: 162 URL: https://proceedings.mlr.press/v162/melnyk22a.html PDF: https://proceedings.mlr.press/v162/melnyk22a/melnyk22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-melnyk22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Pavlo family: Melnyk - given: Michael family: Felsberg - given: Mårten family: Wadenbäck editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15330-15339 id: melnyk22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15330 lastpage: 15339 published: 2022-06-28 00:00:00 +0000 - title: 'Transformers are Meta-Reinforcement Learners' abstract: 'The transformer architecture and variants presented a remarkable success across many machine learning tasks in recent years. This success is intrinsically related to the capability of handling long sequences and the presence of context-dependent weights from the attention mechanism. We argue that these capabilities suit the central role of a Meta-Reinforcement Learning algorithm. Indeed, a meta-RL agent needs to infer the task from a sequence of trajectories. Furthermore, it requires a fast adaptation strategy to adapt its policy for a new task - which can be achieved using the self-attention mechanism. In this work, we present TrMRL (Transformers for Meta-Reinforcement Learning), a meta-RL agent that mimics the memory reinstatement mechanism using the transformer architecture. It associates the recent past of working memories to build an episodic memory recursively through the transformer layers. We show that the self-attention computes a consensus representation that minimizes the Bayes Risk at each layer and provides meaningful features to compute the best actions. We conducted experiments in high-dimensional continuous control environments for locomotion and dexterous manipulation. Results show that TrMRL presents comparable or superior asymptotic performance, sample efficiency, and out-of-distribution generalization compared to the baselines in these environments.' volume: 162 URL: https://proceedings.mlr.press/v162/melo22a.html PDF: https://proceedings.mlr.press/v162/melo22a/melo22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-melo22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Luckeciano C family: Melo editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15340-15359 id: melo22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15340 lastpage: 15359 published: 2022-06-28 00:00:00 +0000 - title: 'ButterflyFlow: Building Invertible Layers with Butterfly Matrices' abstract: 'Normalizing flows model complex probability distributions using maps obtained by composing invertible layers. Special linear layers such as masked and 1{\texttimes}1 convolutions play a key role in existing architectures because they increase expressive power while having tractable Jacobians and inverses. We propose a new family of invertible linear layers based on butterfly layers, which are known to theoretically capture complex linear structures including permutations and periodicity, yet can be inverted efficiently. This representational power is a key advantage of our approach, as such structures are common in many real-world datasets. Based on our invertible butterfly layers, we construct a new class of normalizing flow mod- els called ButterflyFlow. Empirically, we demonstrate that ButterflyFlows not only achieve strong density estimation results on natural images such as MNIST, CIFAR-10, and ImageNet-32{\texttimes}32, but also obtain significantly better log-likelihoods on structured datasets such as galaxy images and MIMIC-III patient cohorts{—}all while being more efficient in terms of memory and computation than relevant baselines.' volume: 162 URL: https://proceedings.mlr.press/v162/meng22a.html PDF: https://proceedings.mlr.press/v162/meng22a/meng22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-meng22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Chenlin family: Meng - given: Linqi family: Zhou - given: Kristy family: Choi - given: Tri family: Dao - given: Stefano family: Ermon editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15360-15375 id: meng22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15360 lastpage: 15375 published: 2022-06-28 00:00:00 +0000 - title: 'In defense of dual-encoders for neural ranking' abstract: 'Transformer-based models such as BERT have proven successful in information retrieval problem, which seek to identify relevant documents for a given query. There are two broad flavours of such models: cross-attention (CA) models, which learn a joint embedding for the query and document, and dual-encoder (DE) models, which learn separate embeddings for the query and document. Empirically, CA models are often found to be more accurate, which has motivated a series of works seeking to bridge this gap. However, a more fundamental question remains less explored: does this performance gap reflect an inherent limitation in the capacity of DE models, or a limitation in the training of such models? And does such an understanding suggest a principled means of improving DE models? In this paper, we study these questions, with three contributions. First, we establish theoretically that with a sufficiently large embedding dimension, DE models have the capacity to model a broad class of score distributions. Second, we show empirically that on real-world problems, DE models may overfit to spurious correlations in the training set, and thus under-perform on test samples. To mitigate this behaviour, we propose a suitable distillation strategy, and confirm its practical efficacy on the MSMARCO-Passage and Natural Questions benchmarks.' volume: 162 URL: https://proceedings.mlr.press/v162/menon22a.html PDF: https://proceedings.mlr.press/v162/menon22a/menon22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-menon22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Aditya family: Menon - given: Sadeep family: Jayasumana - given: Ankit Singh family: Rawat - given: Seungyeon family: Kim - given: Sashank family: Reddi - given: Sanjiv family: Kumar editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15376-15400 id: menon22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15376 lastpage: 15400 published: 2022-06-28 00:00:00 +0000 - title: 'Equivariant Quantum Graph Circuits' abstract: 'We investigate quantum circuits for graph representation learning, and propose equivariant quantum graph circuits (EQGCs), as a class of parameterized quantum circuits with strong relational inductive bias for learning over graph-structured data. Conceptually, EQGCs serve as a unifying framework for quantum graph representation learning, allowing us to define several interesting subclasses which subsume existing proposals. In terms of the representation power, we prove that the studied subclasses of EQGCs are universal approximators for functions over the bounded graph domain. This theoretical perspective on quantum graph machine learning methods opens many directions for further work, and could lead to models with capabilities beyond those of classical approaches. We empirically verify the expressive power of EQGCs through a dedicated experiment on synthetic data, and additionally observe that the performance of EQGCs scales well with the depth of the model and does not suffer from barren plateu issues.' volume: 162 URL: https://proceedings.mlr.press/v162/mernyei22a.html PDF: https://proceedings.mlr.press/v162/mernyei22a/mernyei22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-mernyei22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Peter family: Mernyei - given: Konstantinos family: Meichanetzidis - given: Ismail Ilkan family: Ceylan editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15401-15420 id: mernyei22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15401 lastpage: 15420 published: 2022-06-28 00:00:00 +0000 - title: 'Stochastic Rising Bandits' abstract: 'This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selection techniques able to learn online using only the feedback given by the chosen option (a.k.a. arm). We study a particular case of the rested and restless bandits in which the arms’ expected payoff is monotonically non-decreasing. This characteristic allows designing specifically crafted algorithms that exploit the regularity of the payoffs to provide tight regret bounds. We design an algorithm for the rested case (R-ed-UCB) and one for the restless case (R-less-UCB), providing a regret bound depending on the properties of the instance and, under certain circumstances, of $\widetilde{\mathcal{O}}(T^{\frac{2}{3}})$. We empirically compare our algorithms with state-of-the-art methods for non-stationary MABs over several synthetically generated tasks and an online model selection problem for a real-world dataset. Finally, using synthetic and real-world data, we illustrate the effectiveness of the proposed approaches compared with state-of-the-art algorithms for the non-stationary bandits.' volume: 162 URL: https://proceedings.mlr.press/v162/metelli22a.html PDF: https://proceedings.mlr.press/v162/metelli22a/metelli22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-metelli22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Alberto Maria family: Metelli - given: Francesco family: Trovò - given: Matteo family: Pirola - given: Marcello family: Restelli editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15421-15457 id: metelli22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15421 lastpage: 15457 published: 2022-06-28 00:00:00 +0000 - title: 'Minimizing Control for Credit Assignment with Strong Feedback' abstract: 'The success of deep learning ignited interest in whether the brain learns hierarchical representations using gradient-based learning. However, current biologically plausible methods for gradient-based credit assignment in deep neural networks need infinitesimally small feedback signals, which is problematic in biologically realistic noisy environments and at odds with experimental evidence in neuroscience showing that top-down feedback can significantly influence neural activity. Building upon deep feedback control (DFC), a recently proposed credit assignment method, we combine strong feedback influences on neural activity with gradient-based learning and show that this naturally leads to a novel view on neural network optimization. Instead of gradually changing the network weights towards configurations with low output loss, weight updates gradually minimize the amount of feedback required from a controller that drives the network to the supervised output label. Moreover, we show that the use of strong feedback in DFC allows learning forward and feedback connections simultaneously, using learning rules fully local in space and time. We complement our theoretical results with experiments on standard computer-vision benchmarks, showing competitive performance to backpropagation as well as robustness to noise. Overall, our work presents a fundamentally novel view of learning as control minimization, while sidestepping biologically unrealistic assumptions.' volume: 162 URL: https://proceedings.mlr.press/v162/meulemans22a.html PDF: https://proceedings.mlr.press/v162/meulemans22a/meulemans22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-meulemans22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Alexander family: Meulemans - given: Matilde Tristany family: Farinha - given: Maria R. family: Cervera - given: João family: Sacramento - given: Benjamin F. family: Grewe editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15458-15483 id: meulemans22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15458 lastpage: 15483 published: 2022-06-28 00:00:00 +0000 - title: 'A Dynamical System Perspective for Lipschitz Neural Networks' abstract: 'The Lipschitz constant of neural networks has been established as a key quantity to enforce the robustness to adversarial examples. In this paper, we tackle the problem of building $1$-Lipschitz Neural Networks. By studying Residual Networks from a continuous time dynamical system perspective, we provide a generic method to build $1$-Lipschitz Neural Networks and show that some previous approaches are special cases of this framework. Then, we extend this reasoning and show that ResNet flows derived from convex potentials define $1$-Lipschitz transformations, that lead us to define the Convex Potential Layer (CPL). A comprehensive set of experiments on several datasets demonstrates the scalability of our architecture and the benefits as an $\ell_2$-provable defense against adversarial examples. Our code is available at \url{https://github.com/MILES-PSL/Convex-Potential-Layer}' volume: 162 URL: https://proceedings.mlr.press/v162/meunier22a.html PDF: https://proceedings.mlr.press/v162/meunier22a/meunier22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-meunier22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Laurent family: Meunier - given: Blaise J family: Delattre - given: Alexandre family: Araujo - given: Alexandre family: Allauzen editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15484-15500 id: meunier22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15484 lastpage: 15500 published: 2022-06-28 00:00:00 +0000 - title: 'Distribution Regression with Sliced Wasserstein Kernels' abstract: 'The problem of learning functions over spaces of probabilities - or distribution regression - is gaining significant interest in the machine learning community. The main challenge in these settings is to identify a suitable representation capturing all relevant properties of a distribution. The well-established approach in this sense is to use kernel mean embeddings, which lift kernel-induced similarity on the input domain at the probability level. This strategy effectively tackles the two-stage sampling nature of the problem, enabling one to derive estimators with strong statistical guarantees, such as universal consistency and excess risk bounds. However, kernel mean embeddings implicitly hinge on the maximum mean discrepancy (MMD), a metric on probabilities, which is not the most suited to capture geometrical relations between distributions. In contrast, optimal transport (OT) metrics, are potentially more appealing. In this work, we propose an OT-based estimator for distribution regression. We build on the Sliced Wasserstein distance to obtain an OT-based representation. We study the theoretical properties of a kernel ridge regression estimator based on such representation, for which we prove universal consistency and excess risk bounds. Preliminary experiments complement our theoretical findings by showing the effectiveness of the proposed approach and compare it with MMD-based estimators.' volume: 162 URL: https://proceedings.mlr.press/v162/meunier22b.html PDF: https://proceedings.mlr.press/v162/meunier22b/meunier22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-meunier22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Dimitri family: Meunier - given: Massimiliano family: Pontil - given: Carlo family: Ciliberto editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15501-15523 id: meunier22b issued: date-parts: - 2022 - 6 - 28 firstpage: 15501 lastpage: 15523 published: 2022-06-28 00:00:00 +0000 - title: 'Interpretable and Generalizable Graph Learning via Stochastic Attention Mechanism' abstract: 'Interpretable graph learning is in need as many scientific applications depend on learning models to collect insights from graph-structured data. Previous works mostly focused on using post-hoc approaches to interpret pre-trained models (graph neural networks in particular). They argue against inherently interpretable models because the good interpretability of these models is often at the cost of their prediction accuracy. However, those post-hoc methods often fail to provide stable interpretation and may extract features that are spuriously correlated with the task. In this work, we address these issues by proposing Graph Stochastic Attention (GSAT). Derived from the information bottleneck principle, GSAT injects stochasticity to the attention weights to block the information from task-irrelevant graph components while learning stochasticity-reduced attention to select task-relevant subgraphs for interpretation. The selected subgraphs provably do not contain patterns that are spuriously correlated with the task under some assumptions. Extensive experiments on eight datasets show that GSAT outperforms the state-of-the-art methods by up to 20% in interpretation AUC and 5% in prediction accuracy. Our code is available at https://github.com/Graph-COM/GSAT.' volume: 162 URL: https://proceedings.mlr.press/v162/miao22a.html PDF: https://proceedings.mlr.press/v162/miao22a/miao22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-miao22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Siqi family: Miao - given: Mia family: Liu - given: Pan family: Li editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15524-15543 id: miao22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15524 lastpage: 15543 published: 2022-06-28 00:00:00 +0000 - title: 'Modeling Structure with Undirected Neural Networks' abstract: 'Neural networks are powerful function estimators, leading to their status as a paradigm of choice for modeling structured data. However, unlike other structured representations that emphasize the modularity of the problem {–} e.g., factor graphs {–} neural networks are usually monolithic mappings from inputs to outputs, with a fixed computation order. This limitation prevents them from capturing different directions of computation and interaction between the modeled variables. In this paper, we combine the representational strengths of factor graphs and of neural networks, proposing undirected neural networks (UNNs): a flexible framework for specifying computations that can be performed in any order. For particular choices, our proposed models subsume and extend many existing architectures: feed-forward, recurrent, self-attention networks, auto-encoders, and networks with implicit layers. We demonstrate the effectiveness of undirected neural architectures, both unstructured and structured, on a range of tasks: tree-constrained dependency parsing, convolutional image classification, and sequence completion with attention. By varying the computation order, we show how a single UNN can be used both as a classifier and a prototype generator, and how it can fill in missing parts of an input sequence, making them a promising field for further research.' volume: 162 URL: https://proceedings.mlr.press/v162/mihaylova22a.html PDF: https://proceedings.mlr.press/v162/mihaylova22a/mihaylova22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-mihaylova22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tsvetomila family: Mihaylova - given: Vlad family: Niculae - given: Andre family: Martins editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15544-15560 id: mihaylova22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15544 lastpage: 15560 published: 2022-06-28 00:00:00 +0000 - title: 'Universal Hopfield Networks: A General Framework for Single-Shot Associative Memory Models' abstract: 'A large number of neural network models of associative memory have been proposed in the literature. These include the classical Hopfield networks (HNs), sparse distributed memories (SDMs), and more recently the modern continuous Hopfield networks (MCHNs), which possess close links with self-attention in machine learning. In this paper, we propose a general framework for understanding the operation of such memory networks as a sequence of three operations: similarity, separation, and projection. We derive all these memory models as instances of our general framework with differing similarity and separation functions. We extend the mathematical framework of Krotov et al (2020) to express general associative memory models using neural network dynamics with local computation, and derive a general energy function that is a Lyapunov function of the dynamics. Finally, using our framework, we empirically investigate the capacity of using different similarity functions for these associative memory models, beyond the dot product similarity measure, and demonstrate empirically that Euclidean or Manhattan distance similarity metrics perform substantially better in practice on many tasks, enabling a more robust retrieval and higher memory capacity than existing models.' volume: 162 URL: https://proceedings.mlr.press/v162/millidge22a.html PDF: https://proceedings.mlr.press/v162/millidge22a/millidge22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-millidge22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Beren family: Millidge - given: Tommaso family: Salvatori - given: Yuhang family: Song - given: Thomas family: Lukasiewicz - given: Rafal family: Bogacz editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15561-15583 id: millidge22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15561 lastpage: 15583 published: 2022-06-28 00:00:00 +0000 - title: 'Learning Stochastic Shortest Path with Linear Function Approximation' abstract: 'We study the stochastic shortest path (SSP) problem in reinforcement learning with linear function approximation, where the transition kernel is represented as a linear mixture of unknown models. We call this class of SSP problems as linear mixture SSPs. We propose a novel algorithm with Hoeffding-type confidence sets for learning the linear mixture SSP, which can attain an $\tilde{\mathcal{O}}(d B_{\star}^{1.5}\sqrt{K/c_{\min}})$ regret. Here $K$ is the number of episodes, $d$ is the dimension of the feature mapping in the mixture model, $B_{\star}$ bounds the expected cumulative cost of the optimal policy, and $c_{\min}>0$ is the lower bound of the cost function. Our algorithm also applies to the case when $c_{\min} = 0$, and an $\tilde{\mathcal{O}}(K^{2/3})$ regret is guaranteed. To the best of our knowledge, this is the first algorithm with a sublinear regret guarantee for learning linear mixture SSP. Moreover, we design a refined Bernstein-type confidence set and propose an improved algorithm, which provably achieves an $\tilde{\mathcal{O}}(d B_{\star}\sqrt{K/c_{\min}})$ regret. In complement to the regret upper bounds, we also prove a lower bound of $\Omega(dB_{\star} \sqrt{K})$. Hence, our improved algorithm matches the lower bound up to a $1/\sqrt{c_{\min}}$ factor and poly-logarithmic factors, achieving a near-optimal regret guarantee.' volume: 162 URL: https://proceedings.mlr.press/v162/min22a.html PDF: https://proceedings.mlr.press/v162/min22a/min22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-min22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yifei family: Min - given: Jiafan family: He - given: Tianhao family: Wang - given: Quanquan family: Gu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15584-15629 id: min22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15584 lastpage: 15629 published: 2022-06-28 00:00:00 +0000 - title: 'Prioritized Training on Points that are Learnable, Worth Learning, and not yet Learnt' abstract: 'Training on web-scale data can take months. But much computation and time is wasted on redundant and noisy points that are already learnt or not learnable. To accelerate training, we introduce Reducible Holdout Loss Selection (RHO-LOSS), a simple but principled technique which selects approximately those points for training that most reduce the model’s generalization loss. As a result, RHO-LOSS mitigates the weaknesses of existing data selection methods: techniques from the optimization literature typically select "hard" (e.g. high loss) points, but such points are often noisy (not learnable) or less task-relevant. Conversely, curriculum learning prioritizes "easy" points, but such points need not be trained on once learned. In contrast, RHO-LOSS selects points that are learnable, worth learning, and not yet learnt. RHO-LOSS trains in far fewer steps than prior art, improves accuracy, and speeds up training on a wide range of datasets, hyperparameters, and architectures (MLPs, CNNs, and BERT). On the large web-scraped image dataset Clothing-1M, RHO-LOSS trains in 18x fewer steps and reaches 2% higher final accuracy than uniform data shuffling.' volume: 162 URL: https://proceedings.mlr.press/v162/mindermann22a.html PDF: https://proceedings.mlr.press/v162/mindermann22a/mindermann22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-mindermann22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sören family: Mindermann - given: Jan M family: Brauner - given: Muhammed T family: Razzak - given: Mrinank family: Sharma - given: Andreas family: Kirsch - given: Winnie family: Xu - given: Benedikt family: Höltgen - given: Aidan N family: Gomez - given: Adrien family: Morisot - given: Sebastian family: Farquhar - given: Yarin family: Gal editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15630-15649 id: mindermann22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15630 lastpage: 15649 published: 2022-06-28 00:00:00 +0000 - title: 'POEM: Out-of-Distribution Detection with Posterior Sampling' abstract: 'Out-of-distribution (OOD) detection is indispensable for machine learning models deployed in the open world. Recently, the use of an auxiliary outlier dataset during training (also known as outlier exposure) has shown promising performance. As the sample space for potential OOD data can be prohibitively large, sampling informative outliers is essential. In this work, we propose a novel posterior sampling based outlier mining framework, POEM, which facilitates efficient use of outlier data and promotes learning a compact decision boundary between ID and OOD data for improved detection. We show that POEM establishes state-of-the-art performance on common benchmarks. Compared to the current best method that uses a greedy sampling strategy, POEM improves the relative performance by 42.0% and 24.2% (FPR95) on CIFAR-10 and CIFAR-100, respectively. We further provide theoretical insights on the effectiveness of POEM for OOD detection.' volume: 162 URL: https://proceedings.mlr.press/v162/ming22a.html PDF: https://proceedings.mlr.press/v162/ming22a/ming22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ming22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yifei family: Ming - given: Ying family: Fan - given: Yixuan family: Li editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15650-15665 id: ming22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15650 lastpage: 15665 published: 2022-06-28 00:00:00 +0000 - title: 'A Simple Reward-free Approach to Constrained Reinforcement Learning' abstract: 'In constrained reinforcement learning (RL), a learning agent seeks to not only optimize the overall reward but also satisfy the additional safety, diversity, or budget constraints. Consequently, existing constrained RL solutions require several new algorithmic ingredients that are notably different from standard RL. On the other hand, reward-free RL is independently developed in the unconstrained literature, which learns the transition dynamics without using the reward information, and thus naturally capable of addressing RL with multiple objectives under the common dynamics. This paper bridges reward-free RL and constrained RL. Particularly, we propose a simple meta-algorithm such that given any reward-free RL oracle, the approachability and constrained RL problems can be directly solved with negligible overheads in sample complexity. Utilizing the existing reward-free RL solvers, our framework provides sharp sample complexity results for constrained RL in the tabular MDP setting, matching the best existing results up to a factor of horizon dependence; our framework directly extends to a setting of tabular two-player Markov games, and gives a new result for constrained RL with linear function approximation.' volume: 162 URL: https://proceedings.mlr.press/v162/miryoosefi22a.html PDF: https://proceedings.mlr.press/v162/miryoosefi22a/miryoosefi22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-miryoosefi22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sobhan family: Miryoosefi - given: Chi family: Jin editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15666-15698 id: miryoosefi22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15666 lastpage: 15698 published: 2022-06-28 00:00:00 +0000 - title: 'Wide Neural Networks Forget Less Catastrophically' abstract: 'A primary focus area in continual learning research is alleviating the "catastrophic forgetting" problem in neural networks by designing new algorithms that are more robust to the distribution shifts. While the recent progress in continual learning literature is encouraging, our understanding of what properties of neural networks contribute to catastrophic forgetting is still limited. To address this, instead of focusing on continual learning algorithms, in this work, we focus on the model itself and study the impact of "width" of the neural network architecture on catastrophic forgetting, and show that width has a surprisingly significant effect on forgetting. To explain this effect, we study the learning dynamics of the network from various perspectives such as gradient orthogonality, sparsity, and lazy training regime. We provide potential explanations that are consistent with the empirical results across different architectures and continual learning benchmarks.' volume: 162 URL: https://proceedings.mlr.press/v162/mirzadeh22a.html PDF: https://proceedings.mlr.press/v162/mirzadeh22a/mirzadeh22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-mirzadeh22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Seyed Iman family: Mirzadeh - given: Arslan family: Chaudhry - given: Dong family: Yin - given: Huiyi family: Hu - given: Razvan family: Pascanu - given: Dilan family: Gorur - given: Mehrdad family: Farajtabar editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15699-15717 id: mirzadeh22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15699 lastpage: 15717 published: 2022-06-28 00:00:00 +0000 - title: 'Proximal and Federated Random Reshuffling' abstract: 'Random Reshuffling (RR), also known as Stochastic Gradient Descent (SGD) without replacement, is a popular and theoretically grounded method for finite-sum minimization. We propose two new algorithms: Proximal and Federated Random Reshuffling (ProxRR and FedRR). The first algorithm, ProxRR, solves composite finite-sum minimization problems in which the objective is the sum of a (potentially non-smooth) convex regularizer and an average of $n$ smooth objectives. ProxRR evaluates the proximal operator once per epoch only. When the proximal operator is expensive to compute, this small difference makes ProxRR up to $n$ times faster than algorithms that evaluate the proximal operator in every iteration, such as proximal (stochastic) gradient descent. We give examples of practical optimization tasks where the proximal operator is difficult to compute and ProxRR has a clear advantage. One such task is federated or distributed optimization, where the evaluation of the proximal operator corresponds to communication across the network. We obtain our second algorithm, FedRR, as a special case of ProxRR applied to federated optimization, and prove it has a smaller communication footprint than either distributed gradient descent or Local SGD. Our theory covers both constant and decreasing stepsizes, and allows for importance resampling schemes that can improve conditioning, which may be of independent interest. Our theory covers both convex and nonconvex regimes. Finally, we corroborate our results with experiments on real data sets.' volume: 162 URL: https://proceedings.mlr.press/v162/mishchenko22a.html PDF: https://proceedings.mlr.press/v162/mishchenko22a/mishchenko22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-mishchenko22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Konstantin family: Mishchenko - given: Ahmed family: Khaled - given: Peter family: Richtarik editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15718-15749 id: mishchenko22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15718 lastpage: 15749 published: 2022-06-28 00:00:00 +0000 - title: 'ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!' abstract: 'We introduce ProxSkip—a surprisingly simple and provably efficient method for minimizing the sum of a smooth ($f$) and an expensive nonsmooth proximable ($\psi$) function. The canonical approach to solving such problems is via the proximal gradient descent (ProxGD) algorithm, which is based on the evaluation of the gradient of $f$ and the prox operator of $\psi$ in each iteration. In this work we are specifically interested in the regime in which the evaluation of prox is costly relative to the evaluation of the gradient, which is the case in many applications. ProxSkip allows for the expensive prox operator to be skipped in most iterations: while its iteration complexity is $\mathcal{O}(\kappa \log \nicefrac{1}{\varepsilon})$, where $\kappa$ is the condition number of $f$, the number of prox evaluations is $\mathcal{O}(\sqrt{\kappa} \log \nicefrac{1}{\varepsilon})$ only. Our main motivation comes from federated learning, where evaluation of the gradient operator corresponds to taking a local GD step independently on all devices, and evaluation of prox corresponds to (expensive) communication in the form of gradient averaging. In this context, ProxSkip offers an effective acceleration of communication complexity. Unlike other local gradient-type methods, such as FedAvg, SCAFFOLD, S-Local-GD and FedLin, whose theoretical communication complexity is worse than, or at best matching, that of vanilla GD in the heterogeneous data regime, we obtain a provable and large improvement without any heterogeneity-bounding assumptions.' volume: 162 URL: https://proceedings.mlr.press/v162/mishchenko22b.html PDF: https://proceedings.mlr.press/v162/mishchenko22b/mishchenko22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-mishchenko22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Konstantin family: Mishchenko - given: Grigory family: Malinovsky - given: Sebastian family: Stich - given: Peter family: Richtarik editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15750-15769 id: mishchenko22b issued: date-parts: - 2022 - 6 - 28 firstpage: 15750 lastpage: 15769 published: 2022-06-28 00:00:00 +0000 - title: 'Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone Decompositions' abstract: 'We develop fast algorithms and robust software for convex optimization of two-layer neural networks with ReLU activation functions. Our work leverages a convex re-formulation of the standard weight-decay penalized training problem as a set of group-l1-regularized data-local models, where locality is enforced by polyhedral cone constraints. In the special case of zero-regularization, we show that this problem is exactly equivalent to unconstrained optimization of a convex "gated ReLU" network. For problems with non-zero regularization, we show that convex gated ReLU models obtain data-dependent approximation bounds for the ReLU training problem. To optimize the convex re-formulations, we develop an accelerated proximal gradient method and a practical augmented Lagrangian solver. We show that these approaches are faster than standard training heuristics for the non-convex problem, such as SGD, and outperform commercial interior-point solvers. Experimentally, we verify our theoretical results, explore the group-l1 regularization path, and scale convex optimization for neural networks to image classification on MNIST and CIFAR-10.' volume: 162 URL: https://proceedings.mlr.press/v162/mishkin22a.html PDF: https://proceedings.mlr.press/v162/mishkin22a/mishkin22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-mishkin22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Aaron family: Mishkin - given: Arda family: Sahiner - given: Mert family: Pilanci editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15770-15816 id: mishkin22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15770 lastpage: 15816 published: 2022-06-28 00:00:00 +0000 - title: 'Memory-Based Model Editing at Scale' abstract: 'Even the largest neural networks make errors, and once-correct predictions can become invalid as the world changes. Model editors make local updates to the behavior of base (pre-trained) models to inject updated knowledge or correct undesirable behaviors. Existing model editors have shown promise, but also suffer from insufficient expressiveness: they struggle to accurately model an edit’s intended scope (examples affected by the edit), leading to inaccurate predictions for test inputs loosely related to the edit, and they often fail altogether after many edits. As a higher-capacity alternative, we propose Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model (SERAC), which stores edits in an explicit memory and learns to reason over them to modulate the base model’s predictions as needed. To enable more rigorous evaluation of model editors, we introduce three challenging language model editing problems based on question answering, fact-checking, and dialogue generation. We find that only SERAC achieves high performance on all three problems, consistently outperforming existing approaches to model editing by a significant margin. Code, data, and additional project information will be made available at https://sites.google.com/view/serac-editing.' volume: 162 URL: https://proceedings.mlr.press/v162/mitchell22a.html PDF: https://proceedings.mlr.press/v162/mitchell22a/mitchell22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-mitchell22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Eric family: Mitchell - given: Charles family: Lin - given: Antoine family: Bosselut - given: Christopher D family: Manning - given: Chelsea family: Finn editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15817-15831 id: mitchell22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15817 lastpage: 15831 published: 2022-06-28 00:00:00 +0000 - title: 'Invariant Ancestry Search' abstract: 'Recently, methods have been proposed that exploit the invariance of prediction models with respect to changing environments to infer subsets of the causal parents of a response variable. If the environments influence only few of the underlying mechanisms, the subset identified by invariant causal prediction (ICP), for example, may be small, or even empty. We introduce the concept of minimal invariance and propose invariant ancestry search (IAS). In its population version, IAS outputs a set which contains only ancestors of the response and is a superset of the output of ICP. When applied to data, corresponding guarantees hold asymptotically if the underlying test for invariance has asymptotic level and power. We develop scalable algorithms and perform experiments on simulated and real data.' volume: 162 URL: https://proceedings.mlr.press/v162/mogensen22a.html PDF: https://proceedings.mlr.press/v162/mogensen22a/mogensen22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-mogensen22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Phillip B family: Mogensen - given: Nikolaj family: Thams - given: Jonas family: Peters editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15832-15857 id: mogensen22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15832 lastpage: 15857 published: 2022-06-28 00:00:00 +0000 - title: 'Differentially Private Community Detection for Stochastic Block Models' abstract: 'The goal of community detection over graphs is to recover underlying labels/attributes of users (e.g., political affiliation) given the connectivity between users. There has been significant recent progress on understanding the fundamental limits of community detection when the graph is generated from a stochastic block model (SBM). Specifically, sharp information theoretic limits and efficient algorithms have been obtained for SBMs as a function of $p$ and $q$, which represent the intra-community and inter-community connection probabilities. In this paper, we study the community detection problem while preserving the privacy of the individual connections between the vertices. Focusing on the notion of $(\epsilon, \delta)$-edge differential privacy (DP), we seek to understand the fundamental tradeoffs between $(p, q)$, DP budget $(\epsilon, \delta)$, and computational efficiency for exact recovery of community labels. To this end, we present and analyze the associated information-theoretic tradeoffs for three differentially private community recovery mechanisms: a) stability based mechanism; b) sampling based mechanisms; and c) graph perturbation mechanisms. Our main findings are that stability and sampling based mechanisms lead to a superior tradeoff between $(p,q)$ and the privacy budget $(\epsilon, \delta)$; however this comes at the expense of higher computational complexity. On the other hand, albeit low complexity, graph perturbation mechanisms require the privacy budget $\epsilon$ to scale as $\Omega(\log(n))$ for exact recovery.' volume: 162 URL: https://proceedings.mlr.press/v162/mohamed22a.html PDF: https://proceedings.mlr.press/v162/mohamed22a/mohamed22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-mohamed22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mohamed S family: Mohamed - given: Dung family: Nguyen - given: Anil family: Vullikanti - given: Ravi family: Tandon editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15858-15894 id: mohamed22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15858 lastpage: 15894 published: 2022-06-28 00:00:00 +0000 - title: 'A Multi-objective / Multi-task Learning Framework Induced by Pareto Stationarity' abstract: 'Multi-objective optimization (MOO) and multi-task learning (MTL) have gained much popularity with prevalent use cases such as production model development of regression / classification / ranking models with MOO, and training deep learning models with MTL. Despite the long history of research in MOO, its application to machine learning requires development of solution strategy, and algorithms have recently been developed to solve specific problems such as discovery of any Pareto optimal (PO) solution, and that with a particular form of preference. In this paper, we develop a novel and generic framework to discover a PO solution with multiple forms of preferences. It allows us to formulate a generic MOO / MTL problem to express a preference, which is solved to achieve both alignment with the preference and PO, at the same time. Specifically, we apply the framework to solve the weighted Chebyshev problem and an extension of that. The former is known as a method to discover the Pareto front, the latter helps to find a model that outperforms an existing model with only one run. Experimental results demonstrate not only the method achieves competitive performance with existing methods, but also it allows us to achieve the performance from different forms of preferences.' volume: 162 URL: https://proceedings.mlr.press/v162/momma22a.html PDF: https://proceedings.mlr.press/v162/momma22a/momma22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-momma22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Michinari family: Momma - given: Chaosheng family: Dong - given: Jia family: Liu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15895-15907 id: momma22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15895 lastpage: 15907 published: 2022-06-28 00:00:00 +0000 - title: 'EqR: Equivariant Representations for Data-Efficient Reinforcement Learning' abstract: 'We study a variety of notions of equivariance as an inductive bias in Reinforcement Learning (RL). In particular, we propose new mechanisms for learning representations that are equivariant to both the agent’s action, as well as symmetry transformations of the state-action pairs. Whereas prior work on exploiting symmetries in deep RL can only incorporate predefined linear transformations, our approach allows non-linear symmetry transformations of state-action pairs to be learned from the data. This is achieved through 1) equivariant Lie algebraic parameterization of state and action encodings, 2) equivariant latent transition models, and 3) the incorporation of symmetry-based losses. We demonstrate the advantages of our method, which we call Equivariant representations for RL (EqR), for Atari games in a data-efficient setting limited to 100K steps of interactions with the environment.' volume: 162 URL: https://proceedings.mlr.press/v162/mondal22a.html PDF: https://proceedings.mlr.press/v162/mondal22a/mondal22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-mondal22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Arnab Kumar family: Mondal - given: Vineet family: Jain - given: Kaleem family: Siddiqi - given: Siamak family: Ravanbakhsh editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15908-15926 id: mondal22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15908 lastpage: 15926 published: 2022-06-28 00:00:00 +0000 - title: 'Feature and Parameter Selection in Stochastic Linear Bandits' abstract: 'We study two model selection settings in stochastic linear bandits (LB). In the first setting, which we refer to as feature selection, the expected reward of the LB problem is in the linear span of at least one of $M$ feature maps (models). In the second setting, the reward parameter of the LB problem is arbitrarily selected from $M$ models represented as (possibly) overlapping balls in $\mathbb R^d$. However, the agent only has access to misspecified models, i.e., estimates of the centers and radii of the balls. We refer to this setting as parameter selection. For each setting, we develop and analyze a computationally efficient algorithm that is based on a reduction from bandits to full-information problems. This allows us to obtain regret bounds that are not worse (up to a $\sqrt{\log M}$ factor) than the case where the true model is known. This is the best reported dependence on the number of models $M$ in these settings. Finally, we empirically show the effectiveness of our algorithms using synthetic and real-world experiments.' volume: 162 URL: https://proceedings.mlr.press/v162/moradipari22a.html PDF: https://proceedings.mlr.press/v162/moradipari22a/moradipari22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-moradipari22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ahmadreza family: Moradipari - given: Berkay family: Turan - given: Yasin family: Abbasi-Yadkori - given: Mahnoosh family: Alizadeh - given: Mohammad family: Ghavamzadeh editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15927-15958 id: moradipari22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15927 lastpage: 15958 published: 2022-06-28 00:00:00 +0000 - title: 'Power-Law Escape Rate of SGD' abstract: 'Stochastic gradient descent (SGD) undergoes complicated multiplicative noise for the mean-square loss. We use this property of SGD noise to derive a stochastic differential equation (SDE) with simpler additive noise by performing a random time change. Using this formalism, we show that the log loss barrier $\Delta\log L=\log[L(\theta^s)/L(\theta^*)]$ between a local minimum $\theta^*$ and a saddle $\theta^s$ determines the escape rate of SGD from the local minimum, contrary to the previous results borrowing from physics that the linear loss barrier $\Delta L=L(\theta^s)-L(\theta^*)$ decides the escape rate. Our escape-rate formula strongly depends on the typical magnitude $h^*$ and the number $n$ of the outlier eigenvalues of the Hessian. This result explains an empirical fact that SGD prefers flat minima with low effective dimensions, giving an insight into implicit biases of SGD.' volume: 162 URL: https://proceedings.mlr.press/v162/mori22a.html PDF: https://proceedings.mlr.press/v162/mori22a/mori22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-mori22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Takashi family: Mori - given: Liu family: Ziyin - given: Kangqiao family: Liu - given: Masahito family: Ueda editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15959-15975 id: mori22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15959 lastpage: 15975 published: 2022-06-28 00:00:00 +0000 - title: 'Rethinking Fano’s Inequality in Ensemble Learning' abstract: 'We propose a fundamental theory on ensemble learning that evaluates a given ensemble system by a well-grounded set of metrics. Previous studies used a variant of Fano’s inequality of information theory and derived a lower bound of the classification error rate on the basis of the accuracy and diversity of models. We revisit the original Fano’s inequality and argue that the studies did not take into account the information lost when multiple model predictions are combined into a final prediction. To address this issue, we generalize the previous theory to incorporate the information loss. Further, we empirically validate and demonstrate the proposed theory through extensive experiments on actual systems. The theory reveals the strengths and weaknesses of systems on each metric, which will push the theoretical understanding of ensemble learning and give us insights into designing systems.' volume: 162 URL: https://proceedings.mlr.press/v162/morishita22a.html PDF: https://proceedings.mlr.press/v162/morishita22a/morishita22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-morishita22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Terufumi family: Morishita - given: Gaku family: Morio - given: Shota family: Horiguchi - given: Hiroaki family: Ozaki - given: Nobuo family: Nukaga editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 15976-16016 id: morishita22a issued: date-parts: - 2022 - 6 - 28 firstpage: 15976 lastpage: 16016 published: 2022-06-28 00:00:00 +0000 - title: 'SpeqNets: Sparsity-aware permutation-equivariant graph networks' abstract: 'While message-passing graph neural networks have clear limitations in approximating permutation-equivariant functions over graphs or general relational data, more expressive, higher-order graph neural networks do not scale to large graphs. They either operate on $k$-order tensors or consider all $k$-node subgraphs, implying an exponential dependence on $k$ in memory requirements, and do not adapt to the sparsity of the graph. By introducing new heuristics for the graph isomorphism problem, we devise a class of universal, permutation-equivariant graph networks, which, unlike previous architectures, offer a fine-grained control between expressivity and scalability and adapt to the sparsity of the graph. These architectures lead to vastly reduced computation times compared to standard higher-order graph networks in the supervised node- and graph-level classification and regression regime while significantly improving standard graph neural network and graph kernel architectures in terms of predictive performance.' volume: 162 URL: https://proceedings.mlr.press/v162/morris22a.html PDF: https://proceedings.mlr.press/v162/morris22a/morris22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-morris22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Christopher family: Morris - given: Gaurav family: Rattan - given: Sandra family: Kiefer - given: Siamak family: Ravanbakhsh editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16017-16042 id: morris22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16017 lastpage: 16042 published: 2022-06-28 00:00:00 +0000 - title: 'CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer' abstract: 'Transformer has achieved great successes in learning vision and language representation, which is general across various downstream tasks. In visual control, learning transferable state representation that can transfer between different control tasks is important to reduce the training sample size. However, porting Transformer to sample-efficient visual control remains a challenging and unsolved problem. To this end, we propose a novel Control Transformer (CtrlFormer), possessing many appealing benefits that prior arts do not have. Firstly, CtrlFormer jointly learns self-attention mechanisms between visual tokens and policy tokens among different control tasks, where multitask representation can be learned and transferred without catastrophic forgetting. Secondly, we carefully design a contrastive reinforcement learning paradigm to train CtrlFormer, enabling it to achieve high sample efficiency, which is important in control problems. For example, in the DMControl benchmark, unlike recent advanced methods that failed by producing a zero score in the “Cartpole” task after transfer learning with 100k samples, CtrlFormer can achieve a state-of-the-art score with only 100k samples while maintaining the performance of previous tasks. The code and models are released in our project homepage.' volume: 162 URL: https://proceedings.mlr.press/v162/mu22a.html PDF: https://proceedings.mlr.press/v162/mu22a/mu22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-mu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yao Mark family: Mu - given: Shoufa family: Chen - given: Mingyu family: Ding - given: Jianyu family: Chen - given: Runjian family: Chen - given: Ping family: Luo editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16043-16061 id: mu22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16043 lastpage: 16061 published: 2022-06-28 00:00:00 +0000 - title: 'Generalized Beliefs for Cooperative AI' abstract: 'Self-play is a common method for constructing solutions in Markov games that can yield optimal policies in collaborative settings. However, these policies often adopt highly-specialized conventions that make playing with a novel partner difficult. To address this, recent approaches rely on encoding symmetry and convention-awareness into policy training, but these require strong environmental assumptions and can complicate policy training. To overcome this, we propose moving the learning of conventions to the belief space. Specifically, we propose a belief learning paradigm that can maintain beliefs over rollouts of policies not seen at training time, and can thus decode and adapt to novel conventions at test time. We show how to leverage this belief model for both search and training of a best response over a pool of policies to greatly improve zero-shot coordination. We also show how our paradigm promotes explainability and interpretability of nuanced agent conventions.' volume: 162 URL: https://proceedings.mlr.press/v162/muglich22a.html PDF: https://proceedings.mlr.press/v162/muglich22a/muglich22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-muglich22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Darius family: Muglich - given: Luisa M family: Zintgraf - given: Christian A Schroeder family: De Witt - given: Shimon family: Whiteson - given: Jakob family: Foerster editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16062-16082 id: muglich22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16062 lastpage: 16082 published: 2022-06-28 00:00:00 +0000 - title: 'Bounding the Width of Neural Networks via Coupled Initialization A Worst Case Analysis' abstract: 'A common method in training neural networks is to initialize all the weights to be independent Gaussian vectors. We observe that by instead initializing the weights into independent pairs, where each pair consists of two identical Gaussian vectors, we can significantly improve the convergence analysis. While a similar technique has been studied for random inputs [Daniely, NeurIPS 2020], it has not been analyzed with arbitrary inputs. Using this technique, we show how to significantly reduce the number of neurons required for two-layer ReLU networks, both in the under-parameterized setting with logistic loss, from roughly $\gamma^{-8}$ [Ji and Telgarsky, ICLR 2020] to $\gamma^{-2}$, where $\gamma$ denotes the separation margin with a Neural Tangent Kernel, as well as in the over-parameterized setting with squared loss, from roughly $n^4$ [Song and Yang, 2019] to $n^2$, implicitly also improving the recent running time bound of [Brand, Peng, Song and Weinstein, ITCS 2021]. For the under-parameterized setting we also prove new lower bounds that improve upon prior work, and that under certain assumptions, are best possible.' volume: 162 URL: https://proceedings.mlr.press/v162/munteanu22a.html PDF: https://proceedings.mlr.press/v162/munteanu22a/munteanu22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-munteanu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Alexander family: Munteanu - given: Simon family: Omlor - given: Zhao family: Song - given: David family: Woodruff editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16083-16122 id: munteanu22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16083 lastpage: 16122 published: 2022-06-28 00:00:00 +0000 - title: 'Constants Matter: The Performance Gains of Active Learning' abstract: 'Within machine learning, active learning studies the gains in performance made possible by adaptively selecting data points to label. In this work, we show through upper and lower bounds, that for a simple benign setting of well-specified logistic regression on a uniform distribution over a sphere, the expected excess error of both active learning and random sampling have the same inverse proportional dependence on the number of samples. Importantly, due to the nature of lower bounds, any more general setting does not allow a better dependence on the number of samples. Additionally, we show a variant of uncertainty sampling can achieve a faster rate of convergence than random sampling by a factor of the Bayes error, a recent empirical observation made by other work. Qualitatively, this work is pessimistic with respect to the asymptotic dependence on the number of samples, but optimistic with respect to finding performance gains in the constants.' volume: 162 URL: https://proceedings.mlr.press/v162/mussmann22a.html PDF: https://proceedings.mlr.press/v162/mussmann22a/mussmann22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-mussmann22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Stephen O family: Mussmann - given: Sanjoy family: Dasgupta editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16123-16173 id: mussmann22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16123 lastpage: 16173 published: 2022-06-28 00:00:00 +0000 - title: 'On the Generalization Analysis of Adversarial Learning' abstract: 'Many recent studies have highlighted the susceptibility of virtually all machine-learning models to adversarial attacks. Adversarial attacks are imperceptible changes to an input example of a given prediction model. Such changes are carefully designed to alter the otherwise correct prediction of the model. In this paper, we study the generalization properties of adversarial learning. In particular, we derive high-probability generalization bounds on the adversarial risk in terms of the empirical adversarial risk, the complexity of the function class and the adversarial noise set. Our bounds are generally applicable to many models, losses, and adversaries. We showcase its applicability by deriving adversarial generalization bounds for the multi-class classification setting and various prediction models (including linear models and Deep Neural Networks). We also derive optimistic adversarial generalization bounds for the case of smooth losses. These are the first fast-rate bounds valid for adversarial deep learning to the best of our knowledge.' volume: 162 URL: https://proceedings.mlr.press/v162/mustafa22a.html PDF: https://proceedings.mlr.press/v162/mustafa22a/mustafa22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-mustafa22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Waleed family: Mustafa - given: Yunwen family: Lei - given: Marius family: Kloft editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16174-16196 id: mustafa22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16174 lastpage: 16196 published: 2022-06-28 00:00:00 +0000 - title: 'Universal and data-adaptive algorithms for model selection in linear contextual bandits' abstract: 'Model selection in contextual bandits is an important complementary problem to regret minimization with respect to a fixed model class. We consider the simplest non-trivial instance of model-selection: distinguishing a simple multi-armed bandit problem from a linear contextual bandit problem. Even in this instance, current state-of-the-art methods explore in a suboptimal manner and require strong "feature-diversity" conditions. In this paper, we introduce new algorithms that a) explore in a data-adaptive manner, and b) provide model selection guarantees of the form O(d^{\alpha} T^{1 - \alpha}) with no feature diversity conditions whatsoever, where d denotes the dimension of the linear model and T denotes the total number of rounds. The first algorithm enjoys a "best-of-both-worlds" property, recovering two prior results that hold under distinct distributional assumptions, simultaneously. The second removes distributional assumptions altogether, expanding the scope for tractable model selection. Our approach extends to model selection among nested linear contextual bandits under some additional assumptions.' volume: 162 URL: https://proceedings.mlr.press/v162/muthukumar22a.html PDF: https://proceedings.mlr.press/v162/muthukumar22a/muthukumar22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-muthukumar22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Vidya K family: Muthukumar - given: Akshay family: Krishnamurthy editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16197-16222 id: muthukumar22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16197 lastpage: 16222 published: 2022-06-28 00:00:00 +0000 - title: 'The Importance of Non-Markovianity in Maximum State Entropy Exploration' abstract: 'In the maximum state entropy exploration framework, an agent interacts with a reward-free environment to learn a policy that maximizes the entropy of the expected state visitations it is inducing. Hazan et al. (2019) noted that the class of Markovian stochastic policies is sufficient for the maximum state entropy objective, and exploiting non-Markovianity is generally considered pointless in this setting. In this paper, we argue that non-Markovianity is instead paramount for maximum state entropy exploration in a finite-sample regime. Especially, we recast the objective to target the expected entropy of the induced state visitations in a single trial. Then, we show that the class of non-Markovian deterministic policies is sufficient for the introduced objective, while Markovian policies suffer non-zero regret in general. However, we prove that the problem of finding an optimal non-Markovian policy is NP-hard. Despite this negative result, we discuss avenues to address the problem in a tractable way and how non-Markovian exploration could benefit the sample efficiency of online reinforcement learning in future works.' volume: 162 URL: https://proceedings.mlr.press/v162/mutti22a.html PDF: https://proceedings.mlr.press/v162/mutti22a/mutti22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-mutti22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mirco family: Mutti - given: Riccardo family: De Santi - given: Marcello family: Restelli editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16223-16239 id: mutti22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16223 lastpage: 16239 published: 2022-06-28 00:00:00 +0000 - title: 'PAC-Net: A Model Pruning Approach to Inductive Transfer Learning' abstract: 'Inductive transfer learning aims to learn from a small amount of training data for the target task by utilizing a pre-trained model from the source task. Most strategies that involve large-scale deep learning models adopt initialization with the pre-trained model and fine-tuning for the target task. However, when using over-parameterized models, we can often prune the model without sacrificing the accuracy of the source task. This motivates us to adopt model pruning for transfer learning with deep learning models. In this paper, we propose PAC-Net, a simple yet effective approach for transfer learning based on pruning. PAC-Net consists of three steps: Prune, Allocate, and Calibrate (PAC). The main idea behind these steps is to identify essential weights for the source task, fine-tune on the source task by updating the essential weights, and then calibrate on the target task by updating the remaining redundant weights. Under the various and extensive set of inductive transfer learning experiments, we show that our method achieves state-of-the-art performance by a large margin.' volume: 162 URL: https://proceedings.mlr.press/v162/myung22a.html PDF: https://proceedings.mlr.press/v162/myung22a/myung22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-myung22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sanghoon family: Myung - given: In family: Huh - given: Wonik family: Jang - given: Jae Myung family: Choe - given: Jisu family: Ryu - given: Daesin family: Kim - given: Kee-Eung family: Kim - given: Changwook family: Jeong editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16240-16252 id: myung22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16240 lastpage: 16252 published: 2022-06-28 00:00:00 +0000 - title: 'AutoSNN: Towards Energy-Efficient Spiking Neural Networks' abstract: 'Spiking neural networks (SNNs) that mimic information transmission in the brain can energy-efficiently process spatio-temporal information through discrete and sparse spikes, thereby receiving considerable attention. To improve accuracy and energy efficiency of SNNs, most previous studies have focused solely on training methods, and the effect of architecture has rarely been studied. We investigate the design choices used in the previous studies in terms of the accuracy and number of spikes and figure out that they are not best-suited for SNNs. To further improve the accuracy and reduce the spikes generated by SNNs, we propose a spike-aware neural architecture search framework called AutoSNN. We define a search space consisting of architectures without undesirable design choices. To enable the spike-aware architecture search, we introduce a fitness that considers both the accuracy and number of spikes. AutoSNN successfully searches for SNN architectures that outperform hand-crafted SNNs in accuracy and energy efficiency. We thoroughly demonstrate the effectiveness of AutoSNN on various datasets including neuromorphic datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/na22a.html PDF: https://proceedings.mlr.press/v162/na22a/na22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-na22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Byunggook family: Na - given: Jisoo family: Mok - given: Seongsik family: Park - given: Dongjin family: Lee - given: Hyeokjun family: Choe - given: Sungroh family: Yoon editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16253-16269 id: na22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16253 lastpage: 16269 published: 2022-06-28 00:00:00 +0000 - title: 'Implicit Bias of the Step Size in Linear Diagonal Neural Networks' abstract: 'Focusing on diagonal linear networks as a model for understanding the implicit bias in underdetermined models, we show how the gradient descent step size can have a large qualitative effect on the implicit bias, and thus on generalization ability. In particular, we show how using large step size for non-centered data can change the implicit bias from a "kernel" type behavior to a "rich" (sparsity-inducing) regime — even when gradient flow, studied in previous works, would not escape the "kernel" regime. We do so by using dynamic stability, proving that convergence to dynamically stable global minima entails a bound on some weighted $\ell_1$-norm of the linear predictor, i.e. a "rich" regime. We prove this leads to good generalization in a sparse regression setting.' volume: 162 URL: https://proceedings.mlr.press/v162/nacson22a.html PDF: https://proceedings.mlr.press/v162/nacson22a/nacson22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-nacson22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mor Shpigel family: Nacson - given: Kavya family: Ravichandran - given: Nathan family: Srebro - given: Daniel family: Soudry editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16270-16295 id: nacson22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16270 lastpage: 16295 published: 2022-06-28 00:00:00 +0000 - title: 'DNNR: Differential Nearest Neighbors Regression' abstract: 'K-nearest neighbors (KNN) is one of the earliest and most established algorithms in machine learning. For regression tasks, KNN averages the targets within a neighborhood which poses a number of challenges: the neighborhood definition is crucial for the predictive performance as neighbors might be selected based on uninformative features, and averaging does not account for how the function changes locally. We propose a novel method called Differential Nearest Neighbors Regression (DNNR) that addresses both issues simultaneously: during training, DNNR estimates local gradients to scale the features; during inference, it performs an n-th order Taylor approximation using estimated gradients. In a large-scale evaluation on over 250 datasets, we find that DNNR performs comparably to state-of-the-art gradient boosting methods and MLPs while maintaining the simplicity and transparency of KNN. This allows us to derive theoretical error bounds and inspect failures. In times that call for transparency of ML models, DNNR provides a good balance between performance and interpretability.' volume: 162 URL: https://proceedings.mlr.press/v162/nader22a.html PDF: https://proceedings.mlr.press/v162/nader22a/nader22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-nader22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Youssef family: Nader - given: Leon family: Sixt - given: Tim family: Landgraf editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16296-16317 id: nader22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16296 lastpage: 16317 published: 2022-06-28 00:00:00 +0000 - title: 'Overcoming Oscillations in Quantization-Aware Training' abstract: 'When training neural networks with simulated quantization, we observe that quantized weights can, rather unexpectedly, oscillate between two grid-points. The importance of this effect and its impact on quantization-aware training (QAT) are not well-understood or investigated in literature. In this paper, we delve deeper into the phenomenon of weight oscillations and show that it can lead to a significant accuracy degradation due to wrongly estimated batch-normalization statistics during inference and increased noise during training. These effects are particularly pronounced in low-bit ($\leq$ 4-bits) quantization of efficient networks with depth-wise separable layers, such as MobileNets and EfficientNets. In our analysis we investigate several previously proposed QAT algorithms and show that most of these are unable to overcome oscillations. Finally, we propose two novel QAT algorithms to overcome oscillations during training: oscillation dampening and iterative weight freezing. We demonstrate that our algorithms achieve state-of-the-art accuracy for low-bit (3 & 4 bits) weight and activation quantization of efficient architectures, such as MobileNetV2, MobileNetV3, and EfficentNet-lite on ImageNet. Our source code is available at https://github.com/qualcomm-ai-research/oscillations-qat.' volume: 162 URL: https://proceedings.mlr.press/v162/nagel22a.html PDF: https://proceedings.mlr.press/v162/nagel22a/nagel22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-nagel22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Markus family: Nagel - given: Marios family: Fournarakis - given: Yelysei family: Bondarenko - given: Tijmen family: Blankevoort editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16318-16330 id: nagel22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16318 lastpage: 16330 published: 2022-06-28 00:00:00 +0000 - title: 'Strategic Representation' abstract: 'Humans have come to rely on machines for reducing excessive information to manageable representations. But this reliance can be abused – strategic machines might craft representations that manipulate their users. How can a user make good choices based on strategic representations? We formalize this as a learning problem, and pursue algorithms for decision-making that are robust to manipulation. In our main setting of interest, the system represents attributes of an item to the user, who then decides whether or not to consume. We model this interaction through the lens of strategic classification (Hardt et al. 2016), reversed: the user, who learns, plays first; and the system, which responds, plays second. The system must respond with representations that reveal ‘nothing but the truth’ but need not reveal the entire truth. Thus, the user faces the problem of learning set functions under strategic subset selection, which presents distinct algorithmic and statistical challenges. Our main result is a learning algorithm that minimizes error despite strategic representations, and our theoretical analysis sheds light on the trade-off between learning effort and susceptibility to manipulation.' volume: 162 URL: https://proceedings.mlr.press/v162/nair22a.html PDF: https://proceedings.mlr.press/v162/nair22a/nair22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-nair22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Vineet family: Nair - given: Ganesh family: Ghalme - given: Inbal family: Talgam-Cohen - given: Nir family: Rosenfeld editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16331-16352 id: nair22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16331 lastpage: 16352 published: 2022-06-28 00:00:00 +0000 - title: 'Improving Ensemble Distillation With Weight Averaging and Diversifying Perturbation' abstract: 'Ensembles of deep neural networks have demonstrated superior performance, but their heavy computational cost hinders applying them for resource-limited environments. It motivates distilling knowledge from the ensemble teacher into a smaller student network, and there are two important design choices for this ensemble distillation: 1) how to construct the student network, and 2) what data should be shown during training. In this paper, we propose a weight averaging technique where a student with multiple subnetworks is trained to absorb the functional diversity of ensemble teachers, but then those subnetworks are properly averaged for inference, giving a single student network with no additional inference cost. We also propose a perturbation strategy that seeks inputs from which the diversities of teachers can be better transferred to the student. Combining these two, our method significantly improves upon previous methods on various image classification tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/nam22a.html PDF: https://proceedings.mlr.press/v162/nam22a/nam22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-nam22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Giung family: Nam - given: Hyungi family: Lee - given: Byeongho family: Heo - given: Juho family: Lee editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16353-16367 id: nam22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16353 lastpage: 16367 published: 2022-06-28 00:00:00 +0000 - title: 'Measuring Representational Robustness of Neural Networks Through Shared Invariances' abstract: 'A major challenge in studying robustness in deep learning is defining the set of “meaningless” perturbations to which a given Neural Network (NN) should be invariant. Most work on robustness implicitly uses a human as the reference model to define such perturbations. Our work offers a new view on robustness by using another reference NN to define the set of perturbations a given NN should be invariant to, thus generalizing the reliance on a reference “human NN” to any NN. This makes measuring robustness equivalent to measuring the extent to which two NNs share invariances. We propose a measure called \stir, which faithfully captures the extent to which two NNs share invariances. \stir re-purposes existing representation similarity measures to make them suitable for measuring shared invariances. Using our measure, we are able to gain insights about how shared invariances vary with changes in weight initialization, architecture, loss functions, and training dataset. Our implementation is available at: \url{https://github.com/nvedant07/STIR}.' volume: 162 URL: https://proceedings.mlr.press/v162/nanda22a.html PDF: https://proceedings.mlr.press/v162/nanda22a/nanda22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-nanda22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Vedant family: Nanda - given: Till family: Speicher - given: Camila family: Kolling - given: John P family: Dickerson - given: Krishna family: Gummadi - given: Adrian family: Weller editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16368-16382 id: nanda22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16368 lastpage: 16382 published: 2022-06-28 00:00:00 +0000 - title: 'Tight and Robust Private Mean Estimation with Few Users' abstract: 'In this work, we study high-dimensional mean estimation under user-level differential privacy, and design an $(\varepsilon,\delta)$-differentially private mechanism using as few users as possible. In particular, we provide a nearly optimal trade-off between the number of users and the number of samples per user required for private mean estimation, even when the number of users is as low as $O(\frac{1}{\varepsilon}\log\frac{1}{\delta})$. Interestingly, this bound on the number of users is independent of the dimension (though the number of samples per user is allowed to depend polynomially on the dimension), unlike the previous work that requires the number of users to depend polynomially on the dimension. This resolves a problem first proposed by Amin et al. (2019). Moreover, our mechanism is robust against corruptions in up to $49%$ of the users. Finally, our results also apply to optimal algorithms for privately learning discrete distributions with few users, answering a question of Liu et al. (2020), and a broader range of problems such as stochastic convex optimization and a variant of stochastic gradient descent via a reduction to differentially private mean estimation.' volume: 162 URL: https://proceedings.mlr.press/v162/narayanan22a.html PDF: https://proceedings.mlr.press/v162/narayanan22a/narayanan22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-narayanan22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Shyam family: Narayanan - given: Vahab family: Mirrokni - given: Hossein family: Esfandiari editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16383-16412 id: narayanan22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16383 lastpage: 16412 published: 2022-06-28 00:00:00 +0000 - title: 'Fast Aquatic Swimmer Optimization with Differentiable Projective Dynamics and Neural Network Hydrodynamic Models' abstract: 'Aquatic locomotion is a classic fluid-structure interaction (FSI) problem of interest to biologists and engineers. Solving the fully coupled FSI equations for incompressible Navier-Stokes and finite elasticity is computationally expensive. Optimizing robotic swimmer design within such a system generally involves cumbersome, gradient-free procedures on top of the already costly simulation. To address this challenge we present a novel, fully differentiable hybrid approach to FSI that combines a 2D direct numerical simulation for the deformable solid structure of the swimmer and a physics-constrained neural network surrogate to capture hydrodynamic effects of the fluid. For the deformable solid simulation of the swimmer’s body, we use state-of-the-art techniques from the field of computer graphics to speed up the finite-element method (FEM). For the fluid simulation, we use a U-Net architecture trained with a physics-based loss function to predict the flow field at each time step. The pressure and velocity field outputs from the neural network are sampled around the boundary of our swimmer using an immersed boundary method (IBM) to compute its swimming motion accurately and efficiently. We demonstrate the computational efficiency and differentiability of our hybrid simulator on a 2D carangiform swimmer. Due to differentiability, the simulator can be used for computational design of controls for soft bodies immersed in fluids via direct gradient-based optimization.' volume: 162 URL: https://proceedings.mlr.press/v162/nava22a.html PDF: https://proceedings.mlr.press/v162/nava22a/nava22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-nava22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Elvis family: Nava - given: John Z family: Zhang - given: Mike Yan family: Michelis - given: Tao family: Du - given: Pingchuan family: Ma - given: Benjamin F. family: Grewe - given: Wojciech family: Matusik - given: Robert Kevin family: Katzschmann editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16413-16427 id: nava22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16413 lastpage: 16427 published: 2022-06-28 00:00:00 +0000 - title: 'Multi-Task Learning as a Bargaining Game' abstract: 'In Multi-task learning (MTL), a joint model is trained to simultaneously make predictions for several tasks. Joint training reduces computation costs and improves data efficiency; however, since the gradients of these different tasks may conflict, training a joint model for MTL often yields lower performance than its corresponding single-task counterparts. A common method for alleviating this issue is to combine per-task gradients into a joint update direction using a particular heuristic. In this paper, we propose viewing the gradients combination step as a bargaining game, where tasks negotiate to reach an agreement on a joint direction of parameter update. Under certain assumptions, the bargaining problem has a unique solution, known as the Nash Bargaining Solution, which we propose to use as a principled approach to multi-task learning. We describe a new MTL optimization procedure, Nash-MTL, and derive theoretical guarantees for its convergence. Empirically, we show that Nash-MTL achieves state-of-the-art results on multiple MTL benchmarks in various domains.' volume: 162 URL: https://proceedings.mlr.press/v162/navon22a.html PDF: https://proceedings.mlr.press/v162/navon22a/navon22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-navon22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Aviv family: Navon - given: Aviv family: Shamsian - given: Idan family: Achituve - given: Haggai family: Maron - given: Kenji family: Kawaguchi - given: Gal family: Chechik - given: Ethan family: Fetaya editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16428-16446 id: navon22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16428 lastpage: 16446 published: 2022-06-28 00:00:00 +0000 - title: 'Variational Inference for Infinitely Deep Neural Networks' abstract: 'We introduce the unbounded depth neural network (UDN), an infinitely deep probabilistic model that adapts its complexity to the training data. The UDN contains an infinite sequence of hidden layers and places an unbounded prior on a truncation L, the layer from which it produces its data. Given a dataset of observations, the posterior UDN provides a conditional distribution of both the parameters of the infinite neural network and its truncation. We develop a novel variational inference algorithm to approximate this posterior, optimizing a distribution of the neural network weights and of the truncation depth L, and without any upper limit on L. To this end, the variational family has a special structure: it models neural network weights of arbitrary depth, and it dynamically creates or removes free variational parameters as its distribution of the truncation is optimized. (Unlike heuristic approaches to model search, it is solely through gradient-based optimization that this algorithm explores the space of truncations.) We study the UDN on real and synthetic data. We find that the UDN adapts its posterior depth to the dataset complexity; it outperforms standard neural networks of similar computational complexity; and it outperforms other approaches to infinite-depth neural networks.' volume: 162 URL: https://proceedings.mlr.press/v162/nazaret22a.html PDF: https://proceedings.mlr.press/v162/nazaret22a/nazaret22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-nazaret22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Achille family: Nazaret - given: David family: Blei editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16447-16461 id: nazaret22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16447 lastpage: 16461 published: 2022-06-28 00:00:00 +0000 - title: 'Stable Conformal Prediction Sets' abstract: 'When one observes a sequence of variables $(x_1, y_1), \ldots, (x_n, y_n)$, Conformal Prediction (CP) is a methodology that allows to estimate a confidence set for $y_{n+1}$ given $x_{n+1}$ by merely assuming that the distribution of the data is exchangeable. CP sets have guaranteed coverage for any finite population size $n$. While appealing, the computation of such a set turns out to be infeasible in general, \eg when the unknown variable $y_{n+1}$ is continuous. The bottleneck is that it is based on a procedure that readjusts a prediction model on data where we replace the unknown target by all its possible values in order to select the most probable one. This requires computing an infinite number of models, which often makes it intractable. In this paper, we combine CP techniques with classical algorithmic stability bounds to derive a prediction set computable with a single model fit. We demonstrate that our proposed confidence set does not lose any coverage guarantees while avoiding the need for data splitting as currently done in the literature. We provide some numerical experiments to illustrate the tightness of our estimation when the sample size is sufficiently large, on both synthetic and real datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/ndiaye22a.html PDF: https://proceedings.mlr.press/v162/ndiaye22a/ndiaye22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ndiaye22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Eugene family: Ndiaye editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16462-16479 id: ndiaye22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16462 lastpage: 16479 published: 2022-06-28 00:00:00 +0000 - title: 'Discovering Generalizable Spatial Goal Representations via Graph-based Active Reward Learning' abstract: 'In this work, we consider one-shot imitation learning for object rearrangement tasks, where an AI agent needs to watch a single expert demonstration and learn to perform the same task in different environments. To achieve a strong generalization, the AI agent must infer the spatial goal specification for the task. However, there can be multiple goal specifications that fit the given demonstration. To address this, we propose a reward learning approach, Graph-based Equivalence Mappings (GEM), that can discover spatial goal representations that are aligned with the intended goal specification, enabling successful generalization in unseen environments. Specifically, GEM represents a spatial goal specification by a reward function conditioned on i) a graph indicating important spatial relationships between objects and ii) state equivalence mappings for each edge in the graph indicating invariant properties of the corresponding relationship. GEM combines inverse reinforcement learning and active reward learning to efficiently improve the reward function by utilizing the graph structure and domain randomization enabled by the equivalence mappings. We conducted experiments with simulated oracles and with human subjects. The results show that GEM can drastically improve the generalizability of the learned goal representations over strong baselines.' volume: 162 URL: https://proceedings.mlr.press/v162/netanyahu22a.html PDF: https://proceedings.mlr.press/v162/netanyahu22a/netanyahu22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-netanyahu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Aviv family: Netanyahu - given: Tianmin family: Shu - given: Joshua family: Tenenbaum - given: Pulkit family: Agrawal editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16480-16495 id: netanyahu22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16480 lastpage: 16495 published: 2022-06-28 00:00:00 +0000 - title: 'Sublinear-Time Clustering Oracle for Signed Graphs' abstract: 'Social networks are often modeled using signed graphs, where vertices correspond to users and edges have a sign that indicates whether an interaction between users was positive or negative. The arising signed graphs typically contain a clear community structure in the sense that the graph can be partitioned into a small number of polarized communities, each defining a sparse cut and indivisible into smaller polarized sub-communities. We provide a local clustering oracle for signed graphs with such a clear community structure, that can answer membership queries, i.e., “Given a vertex $v$, which community does $v$ belong to?”, in sublinear time by reading only a small portion of the graph. Formally, when the graph has bounded maximum degree and the number of communities is at most $O(\log n)$, then with $\tilde{O}(\sqrt{n}\operatorname{poly}(1/\varepsilon))$ preprocessing time, our oracle can answer each membership query in $\tilde{O}(\sqrt{n}\operatorname{poly}(1/\varepsilon))$ time, and it correctly classifies a $(1-\varepsilon)$-fraction of vertices w.r.t. a set of hidden planted ground-truth communities. Our oracle is desirable in applications where the clustering information is needed for only a small number of vertices. Previously, such local clustering oracles were only known for unsigned graphs; our generalization to signed graphs requires a number of new ideas and gives a novel spectral analysis of the behavior of random walks with signs. We evaluate our algorithm for constructing such an oracle and answering membership queries on both synthetic and real-world datasets, validating its performance in practice.' volume: 162 URL: https://proceedings.mlr.press/v162/neumann22a.html PDF: https://proceedings.mlr.press/v162/neumann22a/neumann22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-neumann22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Stefan family: Neumann - given: Pan family: Peng editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16496-16528 id: neumann22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16496 lastpage: 16528 published: 2022-06-28 00:00:00 +0000 - title: 'Improved Regret for Differentially Private Exploration in Linear MDP' abstract: 'We study privacy-preserving exploration in sequential decision-making for environments that rely on sensitive data such as medical records. In particular, we focus on solving the problem of reinforcement learning (RL) subject to the constraint of (joint) differential privacy in the linear MDP setting, where both dynamics and rewards are given by linear functions. Prior work on this problem due to (Luyo et al., 2021) achieves a regret rate that has a dependence of O(K^{3/5}) on the number of episodes K. We provide a private algorithm with an improved regret rate with an optimal dependence of O($\sqrt{}$K) on the number of episodes. The key recipe for our stronger regret guarantee is the adaptivity in the policy update schedule, in which an update only occurs when sufficient changes in the data are detected. As a result, our algorithm benefits from low switching cost and only performs O(log(K)) updates, which greatly reduces the amount of privacy noise. Finally, in the most prevalent privacy regimes where the privacy parameter \epsilon is a constant, our algorithm incurs negligible privacy cost{—}in comparison with the existing non-private regret bounds, the additional regret due to privacy appears in lower-order terms.' volume: 162 URL: https://proceedings.mlr.press/v162/ngo22a.html PDF: https://proceedings.mlr.press/v162/ngo22a/ngo22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ngo22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Dung Daniel T family: Ngo - given: Giuseppe family: Vietri - given: Steven family: Wu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16529-16552 id: ngo22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16529 lastpage: 16552 published: 2022-06-28 00:00:00 +0000 - title: 'A Framework for Learning to Request Rich and Contextually Useful Information from Humans' abstract: 'When deployed, AI agents will encounter problems that are beyond their autonomous problem-solving capabilities. Leveraging human assistance can help agents overcome their inherent limitations and robustly cope with unfamiliar situations. We present a general interactive framework that enables an agent to request and interpret rich, contextually useful information from an assistant that has knowledge about the task and the environment. We demonstrate the practicality of our framework on a simulated human-assisted navigation problem. Aided with an assistance-requesting policy learned by our method, a navigation agent achieves up to a 7{\texttimes} improvement in success rate on tasks that take place in previously unseen environments, compared to fully autonomous behavior. We show that the agent can take advantage of different types of information depending on the context, and analyze the benefits and challenges of learning the assistance-requesting policy when the assistant can recursively decompose tasks into subtasks.' volume: 162 URL: https://proceedings.mlr.press/v162/nguyen22a.html PDF: https://proceedings.mlr.press/v162/nguyen22a/nguyen22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-nguyen22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Khanh X family: Nguyen - given: Yonatan family: Bisk - given: Hal Daumé family: Iii editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16553-16568 id: nguyen22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16553 lastpage: 16568 published: 2022-06-28 00:00:00 +0000 - title: 'Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling' abstract: 'Neural Processes (NPs) are a popular class of approaches for meta-learning. Similar to Gaussian Processes (GPs), NPs define distributions over functions and can estimate uncertainty in their predictions. However, unlike GPs, NPs and their variants suffer from underfitting and often have intractable likelihoods, which limit their applications in sequential decision making. We propose Transformer Neural Processes (TNPs), a new member of the NP family that casts uncertainty-aware meta learning as a sequence modeling problem. We learn TNPs via an autoregressive likelihood-based objective and instantiate it with a novel transformer-based architecture that respects the inductive biases inherent to the problem structure, such as invariance to the observed data points and equivariance to the unobserved points. We further design knobs within the TNP architecture to tradeoff the increase in expressivity of the decoding distribution with extra computation. Empirically, we show that TNPs achieve state-of-the-art performance on various benchmark problems, outperforming all previous NP variants on meta regression, image completion, contextual multi-armed bandits, and Bayesian optimization.' volume: 162 URL: https://proceedings.mlr.press/v162/nguyen22b.html PDF: https://proceedings.mlr.press/v162/nguyen22b/nguyen22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-nguyen22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tung family: Nguyen - given: Aditya family: Grover editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16569-16594 id: nguyen22b issued: date-parts: - 2022 - 6 - 28 firstpage: 16569 lastpage: 16594 published: 2022-06-28 00:00:00 +0000 - title: 'Improving Transformers with Probabilistic Attention Keys' abstract: 'Multi-head attention is a driving force behind state-of-the-art transformers, which achieve remarkable performance across a variety of natural language processing (NLP) and computer vision tasks. It has been observed that for many applications, those attention heads learn redundant embedding, and most of them can be removed without degrading the performance of the model. Inspired by this observation, we propose Transformer with a Mixture of Gaussian Keys (Transformer-MGK), a novel transformer architecture that replaces redundant heads in transformers with a mixture of keys at each head. These mixtures of keys follow a Gaussian mixture model and allow each attention head to focus on different parts of the input sequence efficiently. Compared to its conventional transformer counterpart, Transformer-MGK accelerates training and inference, has fewer parameters, and requires fewer FLOPs to compute while achieving comparable or better accuracy across tasks. Transformer-MGK can also be easily extended to use with linear attention. We empirically demonstrate the advantage of Transformer-MGK in a range of practical applications, including language modeling and tasks that involve very long sequences. On the Wikitext-103 and Long Range Arena benchmark, Transformer-MGKs with 4 heads attain comparable or better performance to the baseline transformers with 8 heads.' volume: 162 URL: https://proceedings.mlr.press/v162/nguyen22c.html PDF: https://proceedings.mlr.press/v162/nguyen22c/nguyen22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-nguyen22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tam Minh family: Nguyen - given: Tan Minh family: Nguyen - given: Dung D. D. family: Le - given: Duy Khuong family: Nguyen - given: Viet-Anh family: Tran - given: Richard family: Baraniuk - given: Nhat family: Ho - given: Stanley family: Osher editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16595-16621 id: nguyen22c issued: date-parts: - 2022 - 6 - 28 firstpage: 16595 lastpage: 16621 published: 2022-06-28 00:00:00 +0000 - title: 'On Transportation of Mini-batches: A Hierarchical Approach' abstract: 'Mini-batch optimal transport (m-OT) has been successfully used in practical applications that involve probability measures with a very high number of supports. The m-OT solves several smaller optimal transport problems and then returns the average of their costs and transportation plans. Despite its scalability advantage, the m-OT does not consider the relationship between mini-batches which leads to undesirable estimation. Moreover, the m-OT does not approximate a proper metric between probability measures since the identity property is not satisfied. To address these problems, we propose a novel mini-batch scheme for optimal transport, named Batch of Mini-batches Optimal Transport (BoMb-OT), that finds the optimal coupling between mini-batches and it can be seen as an approximation to a well-defined distance on the space of probability measures. Furthermore, we show that the m-OT is a limit of the entropic regularized version of the BoMb-OT when the regularized parameter goes to infinity. Finally, we carry out experiments on various applications including deep generative models, deep domain adaptation, approximate Bayesian computation, color transfer, and gradient flow to show that the BoMb-OT can be widely applied and performs well in various applications.' volume: 162 URL: https://proceedings.mlr.press/v162/nguyen22d.html PDF: https://proceedings.mlr.press/v162/nguyen22d/nguyen22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-nguyen22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Khai family: Nguyen - given: Dang family: Nguyen - given: Quoc Dinh family: Nguyen - given: Tung family: Pham - given: Hung family: Bui - given: Dinh family: Phung - given: Trung family: Le - given: Nhat family: Ho editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16622-16655 id: nguyen22d issued: date-parts: - 2022 - 6 - 28 firstpage: 16622 lastpage: 16655 published: 2022-06-28 00:00:00 +0000 - title: 'Improving Mini-batch Optimal Transport via Partial Transportation' abstract: 'Mini-batch optimal transport (m-OT) has been widely used recently to deal with the memory issue of OT in large-scale applications. Despite their practicality, m-OT suffers from misspecified mappings, namely, mappings that are optimal on the mini-batch level but are partially wrong in the comparison with the optimal transportation plan between the original measures. Motivated by the misspecified mappings issue, we propose a novel mini-batch method by using partial optimal transport (POT) between mini-batch empirical measures, which we refer to as mini-batch partial optimal transport (m-POT). Leveraging the insight from the partial transportation, we explain the source of misspecified mappings from the m-OT and motivate why limiting the amount of transported masses among mini-batches via POT can alleviate the incorrect mappings. Finally, we carry out extensive experiments on various applications such as deep domain adaptation, partial domain adaptation, deep generative model, color transfer, and gradient flow to demonstrate the favorable performance of m-POT compared to current mini-batch methods.' volume: 162 URL: https://proceedings.mlr.press/v162/nguyen22e.html PDF: https://proceedings.mlr.press/v162/nguyen22e/nguyen22e.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-nguyen22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Khai family: Nguyen - given: Dang family: Nguyen - given: The-Anh family: Vu-Le - given: Tung family: Pham - given: Nhat family: Ho editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16656-16690 id: nguyen22e issued: date-parts: - 2022 - 6 - 28 firstpage: 16656 lastpage: 16690 published: 2022-06-28 00:00:00 +0000 - title: 'Recurrent Model-Free RL Can Be a Strong Baseline for Many POMDPs' abstract: 'Many problems in RL, such as meta-RL, robust RL, generalization in RL, and temporal credit assignment, can be cast as POMDPs. In theory, simply augmenting model-free RL with memory-based architectures, such as recurrent neural networks, provides a general approach to solving all types of POMDPs. However, prior work has found that such recurrent model-free RL methods tend to perform worse than more specialized algorithms that are designed for specific types of POMDPs. This paper revisits this claim. We find that careful architecture and hyperparameter decisions can often yield a recurrent model-free implementation that performs on par with (and occasionally substantially better than) more sophisticated recent techniques. We compare to 21 environments from 6 prior specialized methods and find that our implementation achieves greater sample efficiency and asymptotic performance than these methods on 18/21 environments. We also release a simple and efficient implementation of recurrent model-free RL for future work to use as a baseline for POMDPs.' volume: 162 URL: https://proceedings.mlr.press/v162/ni22a.html PDF: https://proceedings.mlr.press/v162/ni22a/ni22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ni22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tianwei family: Ni - given: Benjamin family: Eysenbach - given: Ruslan family: Salakhutdinov editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16691-16723 id: ni22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16691 lastpage: 16723 published: 2022-06-28 00:00:00 +0000 - title: 'Optimal Estimation of Policy Gradient via Double Fitted Iteration' abstract: 'Policy gradient (PG) estimation becomes a challenge when we are not allowed to sample with the target policy but only have access to a dataset generated by some unknown behavior policy. Conventional methods for off-policy PG estimation often suffer from either significant bias or exponentially large variance. In this paper, we propose the double Fitted PG estimation (FPG) algorithm. FPG can work with an arbitrary policy parameterization, assuming access to a Bellman-complete value function class. In the case of linear value function approximation, we provide a tight finite-sample upper bound on policy gradient estimation error, that is governed by the amount of distribution mismatch measured in feature space. We also establish the asymptotic normality of FPG estimation error with a precise covariance characterization, which is further shown to be statistically optimal with a matching Cramer-Rao lower bound. Empirically, we evaluate the performance of FPG on both policy gradient estimation and policy optimization, using either softmax tabular or ReLU policy networks. Under various metrics, our results show that FPG significantly outperforms existing off-policy PG estimation methods based on importance sampling and variance reduction techniques.' volume: 162 URL: https://proceedings.mlr.press/v162/ni22b.html PDF: https://proceedings.mlr.press/v162/ni22b/ni22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ni22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Chengzhuo family: Ni - given: Ruiqi family: Zhang - given: Xiang family: Ji - given: Xuezhou family: Zhang - given: Mengdi family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16724-16783 id: ni22b issued: date-parts: - 2022 - 6 - 28 firstpage: 16724 lastpage: 16783 published: 2022-06-28 00:00:00 +0000 - title: 'GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models' abstract: 'Diffusion models have recently been shown to generate high-quality synthetic images, especially when paired with a guidance technique to trade off diversity for fidelity. We explore diffusion models for the problem of text-conditional image synthesis and compare two different guidance strategies: CLIP guidance and classifier-free guidance. We find that the latter is preferred by human evaluators for both photorealism and caption similarity, and often produces photorealistic samples. Samples from a 3.5 billion parameter text-conditional diffusion model using classifier-free guidance are favored by human evaluators to those from DALL-E, even when the latter uses expensive CLIP reranking. Additionally, we find that our models can be fine-tuned to perform image inpainting, enabling powerful text-driven image editing. We train a smaller model on a filtered dataset and release the code and weights at https://github.com/openai/glide-text2im.' volume: 162 URL: https://proceedings.mlr.press/v162/nichol22a.html PDF: https://proceedings.mlr.press/v162/nichol22a/nichol22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-nichol22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Alexander Quinn family: Nichol - given: Prafulla family: Dhariwal - given: Aditya family: Ramesh - given: Pranav family: Shyam - given: Pamela family: Mishkin - given: Bob family: Mcgrew - given: Ilya family: Sutskever - given: Mark family: Chen editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16784-16804 id: nichol22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16784 lastpage: 16804 published: 2022-06-28 00:00:00 +0000 - title: 'Diffusion Models for Adversarial Purification' abstract: 'Adversarial purification refers to a class of defense methods that remove adversarial perturbations using a generative model. These methods do not make assumptions on the form of attack and the classification model, and thus can defend pre-existing classifiers against unseen threats. However, their performance currently falls behind adversarial training methods. In this work, we propose DiffPure that uses diffusion models for adversarial purification: Given an adversarial example, we first diffuse it with a small amount of noise following a forward diffusion process, and then recover the clean image through a reverse generative process. To evaluate our method against strong adaptive attacks in an efficient and scalable way, we propose to use the adjoint method to compute full gradients of the reverse generative process. Extensive experiments on three image datasets including CIFAR-10, ImageNet and CelebA-HQ with three classifier architectures including ResNet, WideResNet and ViT demonstrate that our method achieves the state-of-the-art results, outperforming current adversarial training and adversarial purification methods, often by a large margin.' volume: 162 URL: https://proceedings.mlr.press/v162/nie22a.html PDF: https://proceedings.mlr.press/v162/nie22a/nie22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-nie22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Weili family: Nie - given: Brandon family: Guo - given: Yujia family: Huang - given: Chaowei family: Xiao - given: Arash family: Vahdat - given: Animashree family: Anandkumar editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16805-16827 id: nie22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16805 lastpage: 16827 published: 2022-06-28 00:00:00 +0000 - title: 'The Primacy Bias in Deep Reinforcement Learning' abstract: 'This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore useful evidence encountered later. Because of training on progressively growing datasets, deep RL agents incur a risk of overfitting to earlier experiences, negatively affecting the rest of the learning process. Inspired by cognitive science, we refer to this effect as the primacy bias. Through a series of experiments, we dissect the algorithmic aspects of deep RL that exacerbate this bias. We then propose a simple yet generally-applicable mechanism that tackles the primacy bias by periodically resetting a part of the agent. We apply this mechanism to algorithms in both discrete (Atari 100k) and continuous action (DeepMind Control Suite) domains, consistently improving their performance.' volume: 162 URL: https://proceedings.mlr.press/v162/nikishin22a.html PDF: https://proceedings.mlr.press/v162/nikishin22a/nikishin22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-nikishin22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Evgenii family: Nikishin - given: Max family: Schwarzer - given: Pierluca family: D’Oro - given: Pierre-Luc family: Bacon - given: Aaron family: Courville editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16828-16847 id: nikishin22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16828 lastpage: 16847 published: 2022-06-28 00:00:00 +0000 - title: 'Causal Conceptions of Fairness and their Consequences' abstract: 'Recent work highlights the role of causality in designing equitable decision-making algorithms. It is not immediately clear, however, how existing causal conceptions of fairness relate to one another, or what the consequences are of using these definitions as design principles. Here, we first assemble and categorize popular causal definitions of algorithmic fairness into two broad families: (1) those that constrain the effects of decisions on counterfactual disparities; and (2) those that constrain the effects of legally protected characteristics, like race and gender, on decisions. We then show, analytically and empirically, that both families of definitions almost always—in a measure theoretic sense—result in strongly Pareto dominated decision policies, meaning there is an alternative, unconstrained policy favored by every stakeholder with preferences drawn from a large, natural class. For example, in the case of college admissions decisions, policies constrained to satisfy causal fairness definitions would be disfavored by every stakeholder with neutral or positive preferences for both academic preparedness and diversity. Indeed, under a prominent definition of causal fairness, we prove the resulting policies require admitting all students with the same probability, regardless of academic qualifications or group membership. Our results highlight formal limitations and potential adverse consequences of common mathematical notions of causal fairness.' volume: 162 URL: https://proceedings.mlr.press/v162/nilforoshan22a.html PDF: https://proceedings.mlr.press/v162/nilforoshan22a/nilforoshan22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-nilforoshan22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hamed family: Nilforoshan - given: Johann D family: Gaebler - given: Ravi family: Shroff - given: Sharad family: Goel editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16848-16887 id: nilforoshan22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16848 lastpage: 16887 published: 2022-06-28 00:00:00 +0000 - title: 'Efficient Test-Time Model Adaptation without Forgetting' abstract: 'Test-time adaptation provides an effective means of tackling the potential distribution shift between model training and inference, by dynamically updating the model at test time. This area has seen fast progress recently, at the effectiveness of handling test shifts. Nonetheless, prior methods still suffer two key limitations: 1) these methods rely on performing backward computation for each test sample, which takes a considerable amount of time; and 2) these methods focus on improving the performance on out-of-distribution test samples and ignore that the adaptation on test data may result in a catastrophic forgetting issue, \ie, the performance on in-distribution test samples may degrade. To address these issues, we propose an efficient anti-forgetting test-time adaptation (EATA) method. Specifically, we devise a sample-efficient entropy minimization loss to exclude uninformative samples out of backward computation, which improves the overall efficiency and meanwhile boosts the out-of-distribution accuracy. Afterward, we introduce a regularization loss to ensure that critical model weights tend to be preserved during adaptation, thereby alleviating the forgetting issue. Extensive experiments on CIFAR-10-C, ImageNet-C, and ImageNet-R verify the effectiveness and superiority of our EATA.' volume: 162 URL: https://proceedings.mlr.press/v162/niu22a.html PDF: https://proceedings.mlr.press/v162/niu22a/niu22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-niu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Shuaicheng family: Niu - given: Jiaxiang family: Wu - given: Yifan family: Zhang - given: Yaofo family: Chen - given: Shijian family: Zheng - given: Peilin family: Zhao - given: Mingkui family: Tan editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16888-16905 id: niu22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16888 lastpage: 16905 published: 2022-06-28 00:00:00 +0000 - title: 'Generative Trees: Adversarial and Copycat' abstract: 'While Generative Adversarial Networks (GANs) achieve spectacular results on unstructured data like images, there is still a gap on tabular data, data for which state of the art supervised learning still favours decision tree (DT)-based models. This paper proposes a new path forward for the generation of tabular data, exploiting decades-old understanding of the supervised task’s best components for DT induction, from losses (properness), models (tree-based) to algorithms (boosting). The properness condition on the supervised loss – which postulates the optimality of Bayes rule – leads us to a variational GAN-style loss formulation which is tight when discriminators meet a calibration property trivially satisfied by DTs, and, under common assumptions about the supervised loss, yields "one loss to train against them all" for the generator: the $\chi^2$. We then introduce tree-based generative models, generative trees (GTs), meant to mirror on the generative side the good properties of DTs for classifying tabular data, with a boosting-compliant adversarial training algorithm for GTs. We also introduce copycat training, in which the generator copies at run time the underlying tree (graph) of the discriminator DT and completes it for the hardest discriminative task, with boosting compliant convergence. We test our algorithms on tasks including fake/real distinction and missing data imputation.' volume: 162 URL: https://proceedings.mlr.press/v162/nock22a.html PDF: https://proceedings.mlr.press/v162/nock22a/nock22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-nock22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Richard family: Nock - given: Mathieu family: Guillame-Bert editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16906-16951 id: nock22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16906 lastpage: 16951 published: 2022-06-28 00:00:00 +0000 - title: 'Path-Aware and Structure-Preserving Generation of Synthetically Accessible Molecules' abstract: 'Computational chemistry aims to autonomously design specific molecules with target functionality. Generative frameworks provide useful tools to learn continuous representations of molecules in a latent space. While modelers could optimize chemical properties, many generated molecules are not synthesizable. To design synthetically accessible molecules that preserve main structural motifs of target molecules, we propose a reaction-embedded and structure-conditioned variational autoencoder. As the latent space jointly encodes molecular structures and their reaction routes, our new sampling method that measures the path-informed structural similarity allows us to effectively generate structurally analogous synthesizable molecules. When targeting out-of-domain as well as in-domain seed structures, our model generates structurally and property-wisely similar molecules equipped with well-defined reaction paths. By focusing on the important region in chemical space, we also demonstrate that our model can design new molecules with even higher activity than the seed molecules.' volume: 162 URL: https://proceedings.mlr.press/v162/noh22a.html PDF: https://proceedings.mlr.press/v162/noh22a/noh22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-noh22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Juhwan family: Noh - given: Dae-Woong family: Jeong - given: Kiyoung family: Kim - given: Sehui family: Han - given: Moontae family: Lee - given: Honglak family: Lee - given: Yousung family: Jung editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16952-16968 id: noh22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16952 lastpage: 16968 published: 2022-06-28 00:00:00 +0000 - title: 'Utilizing Expert Features for Contrastive Learning of Time-Series Representations' abstract: 'We present an approach that incorporates expert knowledge for time-series representation learning. Our method employs expert features to replace the commonly used data transformations in previous contrastive learning approaches. We do this since time-series data frequently stems from the industrial or medical field where expert features are often available from domain experts, while transformations are generally elusive for time-series data. We start by proposing two properties that useful time-series representations should fulfill and show that current representation learning approaches do not ensure these properties. We therefore devise ExpCLR, a novel contrastive learning approach built on an objective that utilizes expert features to encourage both properties for the learned representation. Finally, we demonstrate on three real-world time-series datasets that ExpCLR surpasses several state-of-the-art methods for both unsupervised and semi-supervised representation learning.' volume: 162 URL: https://proceedings.mlr.press/v162/nonnenmacher22a.html PDF: https://proceedings.mlr.press/v162/nonnenmacher22a/nonnenmacher22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-nonnenmacher22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Manuel T family: Nonnenmacher - given: Lukas family: Oldenburg - given: Ingo family: Steinwart - given: David family: Reeb editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16969-16989 id: nonnenmacher22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16969 lastpage: 16989 published: 2022-06-28 00:00:00 +0000 - title: 'Tranception: Protein Fitness Prediction with Autoregressive Transformers and Inference-time Retrieval' abstract: 'The ability to accurately model the fitness landscape of protein sequences is critical to a wide range of applications, from quantifying the effects of human variants on disease likelihood, to predicting immune-escape mutations in viruses and designing novel biotherapeutic proteins. Deep generative models of protein sequences trained on multiple sequence alignments have been the most successful approaches so far to address these tasks. The performance of these methods is however contingent on the availability of sufficiently deep and diverse alignments for reliable training. Their potential scope is thus limited by the fact many protein families are hard, if not impossible, to align. Large language models trained on massive quantities of non-aligned protein sequences from diverse families address these problems and show potential to eventually bridge the performance gap. We introduce Tranception, a novel transformer architecture leveraging autoregressive predictions and retrieval of homologous sequences at inference to achieve state-of-the-art fitness prediction performance. Given its markedly higher performance on multiple mutants, robustness to shallow alignments and ability to score indels, our approach offers significant gain of scope over existing approaches. To enable more rigorous model testing across a broader range of protein families, we develop ProteinGym – an extensive set of multiplexed assays of variant effects, substantially increasing both the number and diversity of assays compared to existing benchmarks.' volume: 162 URL: https://proceedings.mlr.press/v162/notin22a.html PDF: https://proceedings.mlr.press/v162/notin22a/notin22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-notin22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Pascal family: Notin - given: Mafalda family: Dias - given: Jonathan family: Frazer - given: Javier family: Marchena-Hurtado - given: Aidan N family: Gomez - given: Debora family: Marks - given: Yarin family: Gal editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 16990-17017 id: notin22a issued: date-parts: - 2022 - 6 - 28 firstpage: 16990 lastpage: 17017 published: 2022-06-28 00:00:00 +0000 - title: 'Fast Finite Width Neural Tangent Kernel' abstract: 'The Neural Tangent Kernel (NTK), defined as the outer product of the neural network (NN) Jacobians, has emerged as a central object of study in deep learning. In the infinite width limit, the NTK can sometimes be computed analytically and is useful for understanding training and generalization of NN architectures. At finite widths, the NTK is also used to better initialize NNs, compare the conditioning across models, perform architecture search, and do meta-learning. Unfortunately, the finite width NTK is notoriously expensive to compute, which severely limits its practical utility. We perform the first in-depth analysis of the compute and memory requirements for NTK computation in finite width networks. Leveraging the structure of neural networks, we further propose two novel algorithms that change the exponent of the compute and memory requirements of the finite width NTK, dramatically improving efficiency. Our algorithms can be applied in a black box fashion to any differentiable function, including those implementing neural networks. We open-source our implementations within the Neural Tangents package at https://github.com/google/neural-tangents.' volume: 162 URL: https://proceedings.mlr.press/v162/novak22a.html PDF: https://proceedings.mlr.press/v162/novak22a/novak22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-novak22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Roman family: Novak - given: Jascha family: Sohl-Dickstein - given: Samuel S family: Schoenholz editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17018-17044 id: novak22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17018 lastpage: 17044 published: 2022-06-28 00:00:00 +0000 - title: 'Multicoated Supermasks Enhance Hidden Networks' abstract: 'Hidden Networks (Ramanujan et al., 2020) showed the possibility of finding accurate subnetworks within a randomly weighted neural network by training a connectivity mask, referred to as supermask. We show that the supermask stops improving even though gradients are not zero, thus underutilizing backpropagated information. To address this we propose a method that extends Hidden Networks by training an overlay of multiple hierarchical supermasks{—}a multicoated supermask. This method shows that using multiple supermasks for a single task achieves higher accuracy without additional training cost. Experiments on CIFAR-10 and ImageNet show that Multicoated Supermasks enhance the tradeoff between accuracy and model size. A ResNet-101 using a 7-coated supermask outperforms its Hidden Networks counterpart by 4%, matching the accuracy of a dense ResNet-50 while being an order of magnitude smaller.' volume: 162 URL: https://proceedings.mlr.press/v162/okoshi22a.html PDF: https://proceedings.mlr.press/v162/okoshi22a/okoshi22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-okoshi22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yasuyuki family: Okoshi - given: Ángel López family: Garcı́a-Arias - given: Kazutoshi family: Hirose - given: Kota family: Ando - given: Kazushi family: Kawamura - given: Thiem family: Van Chu - given: Masato family: Motomura - given: Jaehoon family: Yu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17045-17055 id: okoshi22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17045 lastpage: 17055 published: 2022-06-28 00:00:00 +0000 - title: 'Generalized Leverage Scores: Geometric Interpretation and Applications' abstract: 'In problems involving matrix computations, the concept of leverage has found a large number of applications. In particular, leverage scores, which relate the columns of a matrix to the subspaces spanned by its leading singular vectors, are helpful in revealing column subsets to approximately factorize a matrix with quality guarantees. As such, they provide a solid foundation for a variety of machine-learning methods. In this paper we extend the definition of leverage scores to relate the columns of a matrix to arbitrary subsets of singular vectors. We establish a precise connection between column and singular-vector subsets, by relating the concepts of leverage scores and principal angles between subspaces. We employ this result to design approximation algorithms with provable guarantees for two well-known problems: generalized column subset selection and sparse canonical correlation analysis. We run numerical experiments to provide further insight on the proposed methods. The novel bounds we derive improve our understanding of fundamental concepts in matrix approximations. In addition, our insights may serve as building blocks for further contributions.' volume: 162 URL: https://proceedings.mlr.press/v162/ordozgoiti22a.html PDF: https://proceedings.mlr.press/v162/ordozgoiti22a/ordozgoiti22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ordozgoiti22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Bruno family: Ordozgoiti - given: Antonis family: Matakos - given: Aristides family: Gionis editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17056-17070 id: ordozgoiti22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17056 lastpage: 17070 published: 2022-06-28 00:00:00 +0000 - title: 'Practical Almost-Linear-Time Approximation Algorithms for Hybrid and Overlapping Graph Clustering' abstract: 'Detecting communities in real-world networks and clustering similarity graphs are major data mining tasks with a wide range of applications in graph mining, collaborative filtering, and bioinformatics. In many such applications, overwhelming empirical evidence suggests that communities and clusters are naturally overlapping, i.e., the boundary of a cluster may contain both edges across clusters and nodes that are shared with other clusters, calling for novel hybrid graph partitioning algorithms (HGP). While almost-linear-time approximation algorithms are known for edge-boundary-based graph partitioning, little progress has been made on fast algorithms for HGP, even in the special case of vertex-boundary-based graph partitioning. In this work, we introduce a frame-work based on two novel clustering objectives, which naturally extend the well-studied notion of conductance to clusters with hybrid vertex-and edge-boundary structure. Our main algorithmic contributions are almost-linear-time algorithms O(log n)-approximation algorithms for both these objectives. To this end, we show that the cut-matching framework of (Khandekar et al., 2014) can be significantly extended to incorporate hybrid partitions. Crucially, we implement our approximation algorithm to produce both hybrid partitions and optimality certificates for large graphs, easily scaling to tens of millions of edges, and test our implementation on real-world datasets against other competitive baselines.' volume: 162 URL: https://proceedings.mlr.press/v162/orecchia22a.html PDF: https://proceedings.mlr.press/v162/orecchia22a/orecchia22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-orecchia22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Lorenzo family: Orecchia - given: Konstantinos family: Ameranis - given: Charalampos family: Tsourakakis - given: Kunal family: Talwar editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17071-17093 id: orecchia22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17071 lastpage: 17093 published: 2022-06-28 00:00:00 +0000 - title: 'Anticorrelated Noise Injection for Improved Generalization' abstract: 'Injecting artificial noise into gradient descent (GD) is commonly employed to improve the performance of machine learning models. Usually, uncorrelated noise is used in such perturbed gradient descent (PGD) methods. It is, however, not known if this is optimal or whether other types of noise could provide better generalization performance. In this paper, we zoom in on the problem of correlating the perturbations of consecutive PGD steps. We consider a variety of objective functions for which we find that GD with anticorrelated perturbations ("Anti-PGD") generalizes significantly better than GD and standard (uncorrelated) PGD. To support these experimental findings, we also derive a theoretical analysis that demonstrates that Anti-PGD moves to wider minima, while GD and PGD remain stuck in suboptimal regions or even diverge. This new connection between anticorrelated noise and generalization opens the field to novel ways to exploit noise for training machine learning models.' volume: 162 URL: https://proceedings.mlr.press/v162/orvieto22a.html PDF: https://proceedings.mlr.press/v162/orvieto22a/orvieto22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-orvieto22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Antonio family: Orvieto - given: Hans family: Kersting - given: Frank family: Proske - given: Francis family: Bach - given: Aurelien family: Lucchi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17094-17116 id: orvieto22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17094 lastpage: 17116 published: 2022-06-28 00:00:00 +0000 - title: 'Scalable Deep Gaussian Markov Random Fields for General Graphs' abstract: 'Machine learning methods on graphs have proven useful in many applications due to their ability to handle generally structured data. The framework of Gaussian Markov Random Fields (GMRFs) provides a principled way to define Gaussian models on graphs by utilizing their sparsity structure. We propose a flexible GMRF model for general graphs built on the multi-layer structure of Deep GMRFs, originally proposed for lattice graphs only. By designing a new type of layer we enable the model to scale to large graphs. The layer is constructed to allow for efficient training using variational inference and existing software frameworks for Graph Neural Networks. For a Gaussian likelihood, close to exact Bayesian inference is available for the latent field. This allows for making predictions with accompanying uncertainty estimates. The usefulness of the proposed model is verified by experiments on a number of synthetic and real world datasets, where it compares favorably to other both Bayesian and deep learning methods.' volume: 162 URL: https://proceedings.mlr.press/v162/oskarsson22a.html PDF: https://proceedings.mlr.press/v162/oskarsson22a/oskarsson22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-oskarsson22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Joel family: Oskarsson - given: Per family: Sidén - given: Fredrik family: Lindsten editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17117-17137 id: oskarsson22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17117 lastpage: 17137 published: 2022-06-28 00:00:00 +0000 - title: 'Zero-shot AutoML with Pretrained Models' abstract: 'Given a new dataset D and a low compute budget, how should we choose a pre-trained model to fine-tune to D, and set the fine-tuning hyperparameters without risking overfitting, particularly if D is small? Here, we extend automated machine learning (AutoML) to best make these choices. Our domain-independent meta-learning approach learns a zero-shot surrogate model which, at test time, allows to select the right deep learning (DL) pipeline (including the pre-trained model and fine-tuning hyperparameters) for a new dataset D given only trivial meta-features describing D such as image resolution or the number of classes. To train this zero-shot model, we collect performance data for many DL pipelines on a large collection of datasets and meta-train on this data to minimize a pairwise ranking objective. We evaluate our approach under the strict time limit of the vision track of the ChaLearn AutoDL challenge benchmark, clearly outperforming all challenge contenders.' volume: 162 URL: https://proceedings.mlr.press/v162/ozturk22a.html PDF: https://proceedings.mlr.press/v162/ozturk22a/ozturk22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ozturk22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ekrem family: Öztürk - given: Fabio family: Ferreira - given: Hadi family: Jomaa - given: Lars family: Schmidt-Thieme - given: Josif family: Grabocka - given: Frank family: Hutter editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17138-17155 id: ozturk22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17138 lastpage: 17155 published: 2022-06-28 00:00:00 +0000 - title: 'History Compression via Language Models in Reinforcement Learning' abstract: 'In a partially observable Markov decision process (POMDP), an agent typically uses a representation of the past to approximate the underlying MDP. We propose to utilize a frozen Pretrained Language Transformer (PLT) for history representation and compression to improve sample efficiency. To avoid training of the Transformer, we introduce FrozenHopfield, which automatically associates observations with pretrained token embeddings. To form these associations, a modern Hopfield network stores these token embeddings, which are retrieved by queries that are obtained by a random but fixed projection of observations. Our new method, HELM, enables actor-critic network architectures that contain a pretrained language Transformer for history representation as a memory module. Since a representation of the past need not be learned, HELM is much more sample efficient than competitors. On Minigrid and Procgen environments HELM achieves new state-of-the-art results. Our code is available at https://github.com/ml-jku/helm.' volume: 162 URL: https://proceedings.mlr.press/v162/paischer22a.html PDF: https://proceedings.mlr.press/v162/paischer22a/paischer22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-paischer22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Fabian family: Paischer - given: Thomas family: Adler - given: Vihang family: Patil - given: Angela family: Bitto-Nemling - given: Markus family: Holzleitner - given: Sebastian family: Lehner - given: Hamid family: Eghbal-Zadeh - given: Sepp family: Hochreiter editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17156-17185 id: paischer22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17156 lastpage: 17185 published: 2022-06-28 00:00:00 +0000 - title: 'A Study on the Ramanujan Graph Property of Winning Lottery Tickets' abstract: 'Winning lottery tickets refer to sparse subgraphs of deep neural networks which have classification accuracy close to the original dense networks. Resilient connectivity properties of such sparse networks play an important role in their performance. The attempt is to identify a sparse and yet well-connected network to guarantee unhindered information flow. Connectivity in a graph is best characterized by its spectral expansion property. Ramanujan graphs are robust expanders which lead to sparse but highly-connected networks, and thus aid in studying the winning tickets. A feedforward neural network consists of a sequence of bipartite graphs representing its layers. We analyze the Ramanujan graph property of such bipartite layers in terms of their spectral characteristics using the Cheeger’s inequality for irregular graphs. It is empirically observed that the winning ticket networks preserve the Ramanujan graph property and achieve a high accuracy even when the layers are sparse. Accuracy and robustness to noise start declining as many of the layers lose the property. Next we find a robust winning lottery ticket by pruning individual layers while retaining their respective Ramanujan graph property. This strategy is observed to improve the performance of existing network pruning algorithms.' volume: 162 URL: https://proceedings.mlr.press/v162/pal22a.html PDF: https://proceedings.mlr.press/v162/pal22a/pal22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-pal22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Bithika family: Pal - given: Arindam family: Biswas - given: Sudeshna family: Kolay - given: Pabitra family: Mitra - given: Biswajit family: Basu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17186-17201 id: pal22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17186 lastpage: 17201 published: 2022-06-28 00:00:00 +0000 - title: 'On Learning Mixture of Linear Regressions in the Non-Realizable Setting' abstract: 'While mixture of linear regressions (MLR) is a well-studied topic, prior works usually do not analyze such models for prediction error. In fact, prediction and loss are not well-defined in the context of mixtures. In this paper, first we show that MLR can be used for prediction where instead of predicting a label, the model predicts a list of values (also known as list-decoding). The list size is equal to the number of components in the mixture, and the loss function is defined to be minimum among the losses resulted by all the component models. We show that with this definition, a solution of the empirical risk minimization (ERM) achieves small probability of prediction error. This begs for an algorithm to minimize the empirical risk for MLR, which is known to be computationally hard. Prior algorithmic works in MLR focus on the realizable setting, i.e., recovery of parameters when data is probabilistically generated by a mixed linear (noisy) model. In this paper we show that a version of the popular expectation minimization (EM) algorithm finds out the best fit lines in a dataset even when a realizable model is not assumed, under some regularity conditions on the dataset and the initial points, and thereby provides a solution for the ERM. We further provide an algorithm that runs in polynomial time in the number of datapoints, and recovers a good approximation of the best fit lines. The two algorithms are experimentally compared.' volume: 162 URL: https://proceedings.mlr.press/v162/pal22b.html PDF: https://proceedings.mlr.press/v162/pal22b/pal22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-pal22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Soumyabrata family: Pal - given: Arya family: Mazumdar - given: Rajat family: Sen - given: Avishek family: Ghosh editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17202-17220 id: pal22b issued: date-parts: - 2022 - 6 - 28 firstpage: 17202 lastpage: 17220 published: 2022-06-28 00:00:00 +0000 - title: 'Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification' abstract: 'Conservatism has led to significant progress in offline reinforcement learning (RL) where an agent learns from pre-collected datasets. However, as many real-world scenarios involve interaction among multiple agents, it is important to resolve offline RL in the multi-agent setting. Given the recent success of transferring online RL algorithms to the multi-agent setting, one may expect that offline RL algorithms will also transfer to multi-agent settings directly. Surprisingly, we empirically observe that conservative offline RL algorithms do not work well in the multi-agent setting—the performance degrades significantly with an increasing number of agents. Towards mitigating the degradation, we identify a key issue that non-concavity of the value function makes the policy gradient improvements prone to local optima. Multiple agents exacerbate the problem severely, since the suboptimal policy by any agent can lead to uncoordinated global failure. Following this intuition, we propose a simple yet effective method, Offline Multi-Agent RL with Actor Rectification (OMAR), which combines the first-order policy gradients and zeroth-order optimization methods to better optimize the conservative value functions over the actor parameters. Despite the simplicity, OMAR achieves state-of-the-art results in a variety of multi-agent control tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/pan22a.html PDF: https://proceedings.mlr.press/v162/pan22a/pan22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-pan22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ling family: Pan - given: Longbo family: Huang - given: Tengyu family: Ma - given: Huazhe family: Xu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17221-17237 id: pan22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17221 lastpage: 17237 published: 2022-06-28 00:00:00 +0000 - title: 'A Unified Weight Initialization Paradigm for Tensorial Convolutional Neural Networks' abstract: 'Tensorial Convolutional Neural Networks (TCNNs) have attracted much research attention for their power in reducing model parameters or enhancing the generalization ability. However, exploration of TCNNs is hindered even from weight initialization methods. To be specific, general initialization methods, such as Xavier or Kaiming initialization, usually fail to generate appropriate weights for TCNNs. Meanwhile, although there are ad-hoc approaches for specific architectures (e.g., Tensor Ring Nets), they are not applicable to TCNNs with other tensor decomposition methods (e.g., CP or Tucker decomposition). To address this problem, we propose a universal weight initialization paradigm, which generalizes Xavier and Kaiming methods and can be widely applicable to arbitrary TCNNs. Specifically, we first present the Reproducing Transformation to convert the backward process in TCNNs to an equivalent convolution process. Then, based on the convolution operators in the forward and backward processes, we build a unified paradigm to control the variance of features and gradients in TCNNs. Thus, we can derive fan-in and fan-out initialization for various TCNNs. We demonstrate that our paradigm can stabilize the training of TCNNs, leading to faster convergence and better results.' volume: 162 URL: https://proceedings.mlr.press/v162/pan22b.html PDF: https://proceedings.mlr.press/v162/pan22b/pan22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-pan22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yu family: Pan - given: Zeyong family: Su - given: Ao family: Liu - given: Wang family: Jingquan - given: Nannan family: Li - given: Zenglin family: Xu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17238-17257 id: pan22b issued: date-parts: - 2022 - 6 - 28 firstpage: 17238 lastpage: 17257 published: 2022-06-28 00:00:00 +0000 - title: 'Robustness and Accuracy Could Be Reconcilable by (Proper) Definition' abstract: 'The trade-off between robustness and accuracy has been widely studied in the adversarial literature. Although still controversial, the prevailing view is that this trade-off is inherent, either empirically or theoretically. Thus, we dig for the origin of this trade-off in adversarial training and find that it may stem from the improperly defined robust error, which imposes an inductive bias of local invariance — an overcorrection towards smoothness. Given this, we advocate employing local equivariance to describe the ideal behavior of a robust model, leading to a self-consistent robust error named SCORE. By definition, SCORE facilitates the reconciliation between robustness and accuracy, while still handling the worst-case uncertainty via robust optimization. By simply substituting KL divergence with variants of distance metrics, SCORE can be efficiently minimized. Empirically, our models achieve top-rank performance on RobustBench under AutoAttack. Besides, SCORE provides instructive insights for explaining the overfitting phenomenon and semantic input gradients observed on robust models.' volume: 162 URL: https://proceedings.mlr.press/v162/pang22a.html PDF: https://proceedings.mlr.press/v162/pang22a/pang22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-pang22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tianyu family: Pang - given: Min family: Lin - given: Xiao family: Yang - given: Jun family: Zhu - given: Shuicheng family: Yan editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17258-17277 id: pang22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17258 lastpage: 17277 published: 2022-06-28 00:00:00 +0000 - title: 'Towards Coherent and Consistent Use of Entities in Narrative Generation' abstract: 'Large pre-trained language models (LMs) have demonstrated impressive capabilities in generating long, fluent text; however, there is little to no analysis on their ability to maintain entity coherence and consistency. In this work, we focus on the end task of narrative generation and systematically analyse the long-range entity coherence and consistency in generated stories. First, we propose a set of automatic metrics for measuring model performance in terms of entity usage. Given these metrics, we quantify the limitations of current LMs. Next, we propose augmenting a pre-trained LM with a dynamic entity memory in an end-to-end manner by using an auxiliary entity-related loss for guiding the reads and writes to the memory. We demonstrate that the dynamic entity memory increases entity coherence according to both automatic and human judgment and helps preserving entity-related information especially in settings with a limited context window. Finally, we also validate that our automatic metrics are correlated with human ratings and serve as a good indicator of the quality of generated stories.' volume: 162 URL: https://proceedings.mlr.press/v162/papalampidi22a.html PDF: https://proceedings.mlr.press/v162/papalampidi22a/papalampidi22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-papalampidi22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Pinelopi family: Papalampidi - given: Kris family: Cao - given: Tomas family: Kocisky editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17278-17294 id: papalampidi22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17278 lastpage: 17294 published: 2022-06-28 00:00:00 +0000 - title: 'Constrained Discrete Black-Box Optimization using Mixed-Integer Programming' abstract: 'Discrete black-box optimization problems are challenging for model-based optimization (MBO) algorithms, such as Bayesian optimization, due to the size of the search space and the need to satisfy combinatorial constraints. In particular, these methods require repeatedly solving a complex discrete global optimization problem in the inner loop, where popular heuristic inner-loop solvers introduce approximations and are difficult to adapt to combinatorial constraints. In response, we propose NN+MILP, a general discrete MBO framework using piecewise-linear neural networks as surrogate models and mixed-integer linear programming (MILP) to optimize the acquisition function. MILP provides optimality guarantees and a versatile declarative language for domain-specific constraints. We test our approach on a range of unconstrained and constrained problems, including DNA binding, constrained binary quadratic problems from the MINLPLib benchmark, and the NAS-Bench-101 neural architecture search benchmark. NN+MILP surpasses or matches the performance of black-box algorithms tailored to the constraints at hand, with global optimization of the acquisition problem running in a few minutes using only standard software packages and hardware.' volume: 162 URL: https://proceedings.mlr.press/v162/papalexopoulos22a.html PDF: https://proceedings.mlr.press/v162/papalexopoulos22a/papalexopoulos22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-papalexopoulos22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Theodore P family: Papalexopoulos - given: Christian family: Tjandraatmadja - given: Ross family: Anderson - given: Juan Pablo family: Vielma - given: David family: Belanger editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17295-17322 id: papalexopoulos22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17295 lastpage: 17322 published: 2022-06-28 00:00:00 +0000 - title: 'A Theoretical Comparison of Graph Neural Network Extensions' abstract: 'We study and compare different Graph Neural Network extensions that increase the expressive power of GNNs beyond the Weisfeiler-Leman test. We focus on (i) GNNs based on higher order WL methods, (ii) GNNs that preprocess small substructures in the graph, (iii) GNNs that preprocess the graph up to a small radius, and (iv) GNNs that slightly perturb the graph to compute an embedding. We begin by presenting a simple improvement for this last extension that strictly increases the expressive power of this GNN variant. Then, as our main result, we compare the expressiveness of these extensions to each other through a series of example constructions that can be distinguished by one of the extensions, but not by another one. We also show negative examples that are particularly challenging for each of the extensions, and we prove several claims about the ability of these extensions to count cliques and cycles in the graph.' volume: 162 URL: https://proceedings.mlr.press/v162/papp22a.html PDF: https://proceedings.mlr.press/v162/papp22a/papp22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-papp22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Pál András family: Papp - given: Roger family: Wattenhofer editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17323-17345 id: papp22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17323 lastpage: 17345 published: 2022-06-28 00:00:00 +0000 - title: 'Validating Causal Inference Methods' abstract: 'The fundamental challenge of drawing causal inference is that counterfactual outcomes are not fully observed for any unit. Furthermore, in observational studies, treatment assignment is likely to be confounded. Many statistical methods have emerged for causal inference under unconfoundedness conditions given pre-treatment covariates, including propensity score-based methods, prognostic score-based methods, and doubly robust methods. Unfortunately for applied researchers, there is no ‘one-size-fits-all’ causal method that can perform optimally universally. In practice, causal methods are primarily evaluated quantitatively on handcrafted simulated data. Such data-generative procedures can be of limited value because they are typically stylized models of reality. They are simplified for tractability and lack the complexities of real-world data. For applied researchers, it is critical to understand how well a method performs for the data at hand. Our work introduces a deep generative model-based framework, Credence, to validate causal inference methods. The framework’s novelty stems from its ability to generate synthetic data anchored at the empirical distribution for the observed sample, and therefore virtually indistinguishable from the latter. The approach allows the user to specify ground truth for the form and magnitude of causal effects and confounding bias as functions of covariates. Thus simulated data sets are used to evaluate the potential performance of various causal estimation methods when applied to data similar to the observed sample. We demonstrate Credence’s ability to accurately assess the relative performance of causal estimation techniques in an extensive simulation study and two real-world data applications from Lalonde and Project STAR studies.' volume: 162 URL: https://proceedings.mlr.press/v162/parikh22a.html PDF: https://proceedings.mlr.press/v162/parikh22a/parikh22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-parikh22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Harsh family: Parikh - given: Carlos family: Varjao - given: Louise family: Xu - given: Eric Tchetgen family: Tchetgen editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17346-17358 id: parikh22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17346 lastpage: 17358 published: 2022-06-28 00:00:00 +0000 - title: 'The Unsurprising Effectiveness of Pre-Trained Vision Models for Control' abstract: 'Recent years have seen the emergence of pre-trained representations as a powerful abstraction for AI applications in computer vision, natural language, and speech. However, policy learning for control is still dominated by a tabula-rasa learning paradigm, with visuo-motor policies often trained from scratch using data from deployment environments. In this context, we revisit and study the role of pre-trained visual representations for control, and in particular representations trained on large-scale computer vision datasets. Through extensive empirical evaluation in diverse control domains (Habitat, DeepMind Control, Adroit, Franka Kitchen), we isolate and study the importance of different representation training methods, data augmentations, and feature hierarchies. Overall, we find that pre-trained visual representations can be competitive or even better than ground-truth state representations to train control policies. This is in spite of using only out-of-domain data from standard vision datasets, without any in-domain data from the deployment environments.' volume: 162 URL: https://proceedings.mlr.press/v162/parisi22a.html PDF: https://proceedings.mlr.press/v162/parisi22a/parisi22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-parisi22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Simone family: Parisi - given: Aravind family: Rajeswaran - given: Senthil family: Purushwalkam - given: Abhinav family: Gupta editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17359-17371 id: parisi22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17359 lastpage: 17371 published: 2022-06-28 00:00:00 +0000 - title: 'Learning Symmetric Embeddings for Equivariant World Models' abstract: 'Incorporating symmetries can lead to highly data-efficient and generalizable models by defining equivalence classes of data samples related by transformations. However, characterizing how transformations act on input data is often difficult, limiting the applicability of equivariant models. We propose learning symmetric embedding networks (SENs) that encode an input space (e.g. images), where we do not know the effect of transformations (e.g. rotations), to a feature space that transforms in a known manner under these operations. This network can be trained end-to-end with an equivariant task network to learn an explicitly symmetric representation. We validate this approach in the context of equivariant transition models with 3 distinct forms of symmetry. Our experiments demonstrate that SENs facilitate the application of equivariant networks to data with complex symmetry representations. Moreover, doing so can yield improvements in accuracy and generalization relative to both fully-equivariant and non-equivariant baselines.' volume: 162 URL: https://proceedings.mlr.press/v162/park22a.html PDF: https://proceedings.mlr.press/v162/park22a/park22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-park22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jung Yeon family: Park - given: Ondrej family: Biza - given: Linfeng family: Zhao - given: Jan-Willem family: Van De Meent - given: Robin family: Walters editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17372-17389 id: park22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17372 lastpage: 17389 published: 2022-06-28 00:00:00 +0000 - title: 'Blurs Behave Like Ensembles: Spatial Smoothings to Improve Accuracy, Uncertainty, and Robustness' abstract: 'Neural network ensembles, such as Bayesian neural networks (BNNs), have shown success in the areas of uncertainty estimation and robustness. However, a crucial challenge prohibits their use in practice. BNNs require a large number of predictions to produce reliable results, leading to a significant increase in computational cost. To alleviate this issue, we propose spatial smoothing, a method that ensembles neighboring feature map points of convolutional neural networks. By simply adding a few blur layers to the models, we empirically show that spatial smoothing improves accuracy, uncertainty estimation, and robustness of BNNs across a whole range of ensemble sizes. In particular, BNNs incorporating spatial smoothing achieve high predictive performance merely with a handful of ensembles. Moreover, this method also can be applied to canonical deterministic neural networks to improve the performances. A number of evidences suggest that the improvements can be attributed to the stabilized feature maps and the smoothing of the loss landscape. In addition, we provide a fundamental explanation for prior works {—} namely, global average pooling, pre-activation, and ReLU6 {—} by addressing them as special cases of spatial smoothing. These not only enhance accuracy, but also improve uncertainty estimation and robustness by making the loss landscape smoother in the same manner as spatial smoothing. The code is available at https://github.com/xxxnell/spatial-smoothing.' volume: 162 URL: https://proceedings.mlr.press/v162/park22b.html PDF: https://proceedings.mlr.press/v162/park22b/park22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-park22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Namuk family: Park - given: Songkuk family: Kim editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17390-17419 id: park22b issued: date-parts: - 2022 - 6 - 28 firstpage: 17390 lastpage: 17419 published: 2022-06-28 00:00:00 +0000 - title: 'Exact Optimal Accelerated Complexity for Fixed-Point Iterations' abstract: 'Despite the broad use of fixed-point iterations throughout applied mathematics, the optimal convergence rate of general fixed-point problems with nonexpansive nonlinear operators has not been established. This work presents an acceleration mechanism for fixed-point iterations with nonexpansive operators, contractive operators, and nonexpansive operators satisfying a Hölder-type growth condition. We then provide matching complexity lower bounds to establish the exact optimality of the acceleration mechanisms in the nonexpansive and contractive setups. Finally, we provide experiments with CT imaging, optimal transport, and decentralized optimization to demonstrate the practical effectiveness of the acceleration mechanism.' volume: 162 URL: https://proceedings.mlr.press/v162/park22c.html PDF: https://proceedings.mlr.press/v162/park22c/park22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-park22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jisun family: Park - given: Ernest K family: Ryu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17420-17457 id: park22c issued: date-parts: - 2022 - 6 - 28 firstpage: 17420 lastpage: 17457 published: 2022-06-28 00:00:00 +0000 - title: 'Kernel Methods for Radial Transformed Compositional Data with Many Zeros' abstract: 'Compositional data analysis with a high proportion of zeros has gained increasing popularity, especially in chemometrics and human gut microbiomes research. Statistical analyses of this type of data are typically carried out via a log-ratio transformation after replacing zeros with small positive values. We should note, however, that this procedure is geometrically improper, as it causes anomalous distortions through the transformation. We propose a radial transformation that does not require zero substitutions and more importantly results in essential equivalence between domains before and after the transformation. We show that a rich class of kernels on hyperspheres can successfully define a kernel embedding for compositional data based on this equivalence. To the best of our knowledge, this is the first work that theoretically establishes the availability of the extensive library of kernel-based machine learning methods for compositional data. The applicability of the proposed approach is demonstrated with kernel principal component analysis.' volume: 162 URL: https://proceedings.mlr.press/v162/park22d.html PDF: https://proceedings.mlr.press/v162/park22d/park22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-park22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Junyoung family: Park - given: Changwon family: Yoon - given: Cheolwoo family: Park - given: Jeongyoun family: Ahn editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17458-17472 id: park22d issued: date-parts: - 2022 - 6 - 28 firstpage: 17458 lastpage: 17472 published: 2022-06-28 00:00:00 +0000 - title: 'Evolving Curricula with Regret-Based Environment Design' abstract: 'Training generally-capable agents with reinforcement learning (RL) remains a significant challenge. A promising avenue for improving the robustness of RL agents is through the use of curricula. One such class of methods frames environment design as a game between a student and a teacher, using regret-based objectives to produce environment instantiations (or levels) at the frontier of the student agent’s capabilities. These methods benefit from theoretical robustness guarantees at equilibrium, yet they often struggle to find effective levels in challenging design spaces in practice. By contrast, evolutionary approaches incrementally alter environment complexity, resulting in potentially open-ended learning, but often rely on domain-specific heuristics and vast amounts of computational resources. This work proposes harnessing the power of evolution in a principled, regret-based curriculum. Our approach, which we call Adversarially Compounding Complexity by Editing Levels (ACCEL), seeks to constantly produce levels at the frontier of an agent’s capabilities, resulting in curricula that start simple but become increasingly complex. ACCEL maintains the theoretical benefits of prior regret-based methods, while providing significant empirical gains in a diverse set of environments. An interactive version of this paper is available at https://accelagent.github.io.' volume: 162 URL: https://proceedings.mlr.press/v162/parker-holder22a.html PDF: https://proceedings.mlr.press/v162/parker-holder22a/parker-holder22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-parker-holder22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jack family: Parker-Holder - given: Minqi family: Jiang - given: Michael family: Dennis - given: Mikayel family: Samvelyan - given: Jakob family: Foerster - given: Edward family: Grefenstette - given: Tim family: Rocktäschel editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17473-17498 id: parker-holder22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17473 lastpage: 17498 published: 2022-06-28 00:00:00 +0000 - title: 'Neural Language Models are not Born Equal to Fit Brain Data, but Training Helps' abstract: 'Neural Language Models (NLMs) have made tremendous advances during the last years, achieving impressive performance on various linguistic tasks. Capitalizing on this, studies in neuroscience have started to use NLMs to study neural activity in the human brain during language processing. However, many questions remain unanswered regarding which factors determine the ability of a neural language model to capture brain activity (aka its ’brain score’). Here, we make first steps in this direction and examine the impact of test loss, training corpus and model architecture (comparing GloVe, LSTM, GPT-2 and BERT), on the prediction of functional Magnetic Resonance Imaging time-courses of participants listening to an audiobook. We find that (1) untrained versions of each model already explain significant amount of signal in the brain by capturing similarity in brain responses across identical words, with the untrained LSTM outperforming the transformer-based models, being less impacted by the effect of context; (2) that training NLP models improves brain scores in the same brain regions irrespective of the model’s architecture; (3) that Perplexity (test loss) is not a good predictor of brain score; (4) that training data have a strong influence on the outcome and, notably, that off-the-shelf models may lack statistical power to detect brain activations. Overall, we outline the impact of model-training choices, and suggest good practices for future studies aiming at explaining the human language system using neural language models.' volume: 162 URL: https://proceedings.mlr.press/v162/pasquiou22a.html PDF: https://proceedings.mlr.press/v162/pasquiou22a/pasquiou22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-pasquiou22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Alexandre family: Pasquiou - given: Yair family: Lakretz - given: John T family: Hale - given: Bertrand family: Thirion - given: Christophe family: Pallier editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17499-17516 id: pasquiou22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17499 lastpage: 17516 published: 2022-06-28 00:00:00 +0000 - title: 'A new similarity measure for covariate shift with applications to nonparametric regression' abstract: 'We study covariate shift in the context of nonparametric regression. We introduce a new measure of distribution mismatch between the source and target distributions using the integrated ratio of probabilities of balls at a given radius. We use the scaling of this measure with respect to the radius to characterize the minimax rate of estimation over a family of H{ö}lder continuous functions under covariate shift. In comparison to the recently proposed notion of transfer exponent, this measure leads to a sharper rate of convergence and is more fine-grained. We accompany our theory with concrete instances of covariate shift that illustrate this sharp difference.' volume: 162 URL: https://proceedings.mlr.press/v162/pathak22a.html PDF: https://proceedings.mlr.press/v162/pathak22a/pathak22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-pathak22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Reese family: Pathak - given: Cong family: Ma - given: Martin family: Wainwright editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17517-17530 id: pathak22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17517 lastpage: 17530 published: 2022-06-28 00:00:00 +0000 - title: 'Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution' abstract: 'Reinforcement learning algorithms require many samples when solving complex hierarchical tasks with sparse and delayed rewards. For such complex tasks, the recently proposed RUDDER uses reward redistribution to leverage steps in the Q-function that are associated with accomplishing sub-tasks. However, often only few episodes with high rewards are available as demonstrations since current exploration strategies cannot discover them in reasonable time. In this work, we introduce Align-RUDDER, which utilizes a profile model for reward redistribution that is obtained from multiple sequence alignment of demonstrations. Consequently, Align-RUDDER employs reward redistribution effectively and, thereby, drastically improves learning on few demonstrations. Align-RUDDER outperforms competitors on complex artificial tasks with delayed rewards and few demonstrations. On the Minecraft ObtainDiamond task, Align-RUDDER is able to mine a diamond, though not frequently. Code is available at github.com/ml-jku/align-rudder.' volume: 162 URL: https://proceedings.mlr.press/v162/patil22a.html PDF: https://proceedings.mlr.press/v162/patil22a/patil22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-patil22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Vihang family: Patil - given: Markus family: Hofmarcher - given: Marius-Constantin family: Dinu - given: Matthias family: Dorfer - given: Patrick M family: Blies - given: Johannes family: Brandstetter - given: José family: Arjona-Medina - given: Sepp family: Hochreiter editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17531-17572 id: patil22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17531 lastpage: 17572 published: 2022-06-28 00:00:00 +0000 - title: 'POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging' abstract: 'Fine-tuning models on edge devices like mobile phones would enable privacy-preserving personalization over sensitive data. However, edge training has historically been limited to relatively small models with simple architectures because training is both memory and energy intensive. We present POET, an algorithm to enable training large neural networks on memory-scarce battery-operated edge devices. POET jointly optimizes the integrated search search spaces of rematerialization and paging, two algorithms to reduce the memory consumption of backpropagation. Given a memory budget and a run-time constraint, we formulate a mixed-integer linear program (MILP) for energy-optimal training. Our approach enables training significantly larger models on embedded devices while reducing energy consumption while not modifying mathematical correctness of backpropagation. We demonstrate that it is possible to fine-tune both ResNet-18 and BERT within the memory constraints of a Cortex-M class embedded device while outperforming current edge training methods in energy efficiency. POET is an open-source project available at https://github.com/ShishirPatil/poet' volume: 162 URL: https://proceedings.mlr.press/v162/patil22b.html PDF: https://proceedings.mlr.press/v162/patil22b/patil22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-patil22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Shishir G. family: Patil - given: Paras family: Jain - given: Prabal family: Dutta - given: Ion family: Stoica - given: Joseph family: Gonzalez editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17573-17583 id: patil22b issued: date-parts: - 2022 - 6 - 28 firstpage: 17573 lastpage: 17583 published: 2022-06-28 00:00:00 +0000 - title: 'Learning to Cut by Looking Ahead: Cutting Plane Selection via Imitation Learning' abstract: 'Cutting planes are essential for solving mixed-integer linear problems (MILPs), because they facilitate bound improvements on the optimal solution value. For selecting cuts, modern solvers rely on manually designed heuristics that are tuned to gauge the potential effectiveness of cuts. We show that a greedy selection rule explicitly looking ahead to select cuts that yield the best bound improvement delivers strong decisions for cut selection – but is too expensive to be deployed in practice. In response, we propose a new neural architecture (NeuralCut) for imitation learning on the lookahead expert. Our model outperforms standard baselines for cut selection on several synthetic MILP benchmarks. Experiments on a realistic B&C solver further validate our approach, and exhibit the potential of learning methods in this setting.' volume: 162 URL: https://proceedings.mlr.press/v162/paulus22a.html PDF: https://proceedings.mlr.press/v162/paulus22a/paulus22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-paulus22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Max B family: Paulus - given: Giulia family: Zarpellon - given: Andreas family: Krause - given: Laurent family: Charlin - given: Chris family: Maddison editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17584-17600 id: paulus22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17584 lastpage: 17600 published: 2022-06-28 00:00:00 +0000 - title: 'Neural Network Pruning Denoises the Features and Makes Local Connectivity Emerge in Visual Tasks' abstract: 'Pruning methods can considerably reduce the size of artificial neural networks without harming their performance and in some cases they can even uncover sub-networks that, when trained in isolation, match or surpass the test accuracy of their dense counterparts. Here, we characterize the inductive bias that pruning imprints in such "winning lottery tickets": focusing on visual tasks, we analyze the architecture resulting from iterative magnitude pruning of a simple fully connected network. We show that the surviving node connectivity is local in input space, and organized in patterns reminiscent of the ones found in convolutional networks. We investigate the role played by data and tasks in shaping the architecture of the pruned sub-network. We find that pruning performances, and the ability to sift out the noise and make local features emerge, improve by increasing the size of the training set, and the semantic value of the data. We also study different pruning procedures, and find that iterative magnitude pruning is particularly effective in distilling meaningful connectivity out of features present in the original task. Our results suggest the possibility to automatically discover new and efficient architectural inductive biases in other datasets and tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/pellegrini22a.html PDF: https://proceedings.mlr.press/v162/pellegrini22a/pellegrini22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-pellegrini22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Franco family: Pellegrini - given: Giulio family: Biroli editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17601-17626 id: pellegrini22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17601 lastpage: 17626 published: 2022-06-28 00:00:00 +0000 - title: 'Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding' abstract: 'Conformer has proven to be effective in many speech processing tasks. It combines the benefits of extracting local dependencies using convolutions and global dependencies using self-attention. Inspired by this, we propose a more flexible, interpretable and customizable encoder alternative, Branchformer, with parallel branches for modeling various ranged dependencies in end-to-end speech processing. In each encoder layer, one branch employs self-attention or its variant to capture long-range dependencies, while the other branch utilizes an MLP module with convolutional gating (cgMLP) to extract local relationships. We conduct experiments on several speech recognition and spoken language understanding benchmarks. Results show that our model outperforms both Transformer and cgMLP. It also matches with or outperforms state-of-the-art results achieved by Conformer. Furthermore, we show various strategies to reduce computation thanks to the two-branch architecture, including the ability to have variable inference complexity in a single trained model. The weights learned for merging branches indicate how local and global dependencies are utilized in different layers, which benefits model designing.' volume: 162 URL: https://proceedings.mlr.press/v162/peng22a.html PDF: https://proceedings.mlr.press/v162/peng22a/peng22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-peng22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yifan family: Peng - given: Siddharth family: Dalmia - given: Ian family: Lane - given: Shinji family: Watanabe editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17627-17643 id: peng22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17627 lastpage: 17643 published: 2022-06-28 00:00:00 +0000 - title: 'Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets' abstract: 'Deep generative models have achieved tremendous success in designing novel drug molecules in recent years. A new thread of works have shown potential in advancing the specificity and success rate of in silico drug design by considering the structure of protein pockets. This setting posts fundamental computational challenges in sampling new chemical compounds that could satisfy multiple geometrical constraints imposed by pockets. Previous sampling algorithms either sample in the graph space or only consider the 3D coordinates of atoms while ignoring other detailed chemical structures such as bond types and functional groups. To address the challenge, we develop an E(3)-equivariant generative network composed of two modules: 1) a new graph neural network capturing both spatial and bonding relationships between atoms of the binding pockets and 2) a new efficient algorithm which samples new drug candidates conditioned on the pocket representations from a tractable distribution without relying on MCMC. Experimental results demonstrate that molecules sampled from Pocket2Mol achieve significantly better binding affinity and other drug properties such as drug-likeness and synthetic accessibility.' volume: 162 URL: https://proceedings.mlr.press/v162/peng22b.html PDF: https://proceedings.mlr.press/v162/peng22b/peng22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-peng22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xingang family: Peng - given: Shitong family: Luo - given: Jiaqi family: Guan - given: Qi family: Xie - given: Jian family: Peng - given: Jianzhu family: Ma editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17644-17655 id: peng22b issued: date-parts: - 2022 - 6 - 28 firstpage: 17644 lastpage: 17655 published: 2022-06-28 00:00:00 +0000 - title: 'Differentiable Top-k Classification Learning' abstract: 'The top-k classification accuracy is one of the core metrics in machine learning. Here, k is conventionally a positive integer, such as 1 or 5, leading to top-1 or top-5 training objectives. In this work, we relax this assumption and optimize the model for multiple k simultaneously instead of using a single k. Leveraging recent advances in differentiable sorting and ranking, we propose a family of differentiable top-k cross-entropy classification losses. This allows training while not only considering the top-1 prediction, but also, e.g., the top-2 and top-5 predictions. We evaluate the proposed losses for fine-tuning on state-of-the-art architectures, as well as for training from scratch. We find that relaxing k not only produces better top-5 accuracies, but also leads to top-1 accuracy improvements. When fine-tuning publicly available ImageNet models, we achieve a new state-of-the-art for these models.' volume: 162 URL: https://proceedings.mlr.press/v162/petersen22a.html PDF: https://proceedings.mlr.press/v162/petersen22a/petersen22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-petersen22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Felix family: Petersen - given: Hilde family: Kuehne - given: Christian family: Borgelt - given: Oliver family: Deussen editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17656-17668 id: petersen22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17656 lastpage: 17668 published: 2022-06-28 00:00:00 +0000 - title: 'Multi-scale Feature Learning Dynamics: Insights for Double Descent' abstract: 'An intriguing phenomenon that arises from the high-dimensional learning dynamics of neural networks is the phenomenon of “double descent”. The more commonly studied aspect of this phenomenon corresponds to model-wise double descent where the test error exhibits a second descent with increasing model complexity, beyond the classical U-shaped error curve. In this work, we investigate the origins of the less studied epoch-wise double descent in which the test error undergoes two non-monotonous transitions, or descents as the training time increases. We study a linear teacher-student setup exhibiting epoch-wise double descent similar to that in deep neural networks. In this setting, we derive closed-form analytical expressions describing the generalization error in terms of low-dimensional scalar macroscopic variables. We find that double descent can be attributed to distinct features being learned at different scales: as fast-learning features overfit, slower-learning features start to fit, resulting in a second descent in test error. We validate our findings through numerical simulations where our theory accurately predicts empirical findings and remains consistent with observations in deep neural networks.' volume: 162 URL: https://proceedings.mlr.press/v162/pezeshki22a.html PDF: https://proceedings.mlr.press/v162/pezeshki22a/pezeshki22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-pezeshki22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mohammad family: Pezeshki - given: Amartya family: Mitra - given: Yoshua family: Bengio - given: Guillaume family: Lajoie editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17669-17690 id: pezeshki22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17669 lastpage: 17690 published: 2022-06-28 00:00:00 +0000 - title: 'A Differential Entropy Estimator for Training Neural Networks' abstract: 'Mutual Information (MI) has been widely used as a loss regularizer for training neural networks. This has been particularly effective when learn disentangled or compressed representations of high dimensional data. However, differential entropy (DE), another fundamental measure of information, has not found widespread use in neural network training. Although DE offers a potentially wider range of applications than MI, off-the-shelf DE estimators are either non differentiable, computationally intractable or fail to adapt to changes in the underlying distribution. These drawbacks prevent them from being used as regularizers in neural networks training. To address shortcomings in previously proposed estimators for DE, here we introduce KNIFE, a fully parameterized, differentiable kernel-based estimator of DE. The flexibility of our approach also allows us to construct KNIFE-based estimators for conditional (on either discrete or continuous variables) DE, as well as MI. We empirically validate our method on high-dimensional synthetic data and further apply it to guide the training of neural networks for real-world tasks. Our experiments on a large variety of tasks, including visual domain adaptation, textual fair classification, and textual fine-tuning demonstrate the effectiveness of KNIFE-based estimation. Code can be found at https://github.com/g-pichler/knife.' volume: 162 URL: https://proceedings.mlr.press/v162/pichler22a.html PDF: https://proceedings.mlr.press/v162/pichler22a/pichler22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-pichler22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Georg family: Pichler - given: Pierre Jean A. family: Colombo - given: Malik family: Boudiaf - given: Günther family: Koliander - given: Pablo family: Piantanida editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17691-17715 id: pichler22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17691 lastpage: 17715 published: 2022-06-28 00:00:00 +0000 - title: 'Federated Learning with Partial Model Personalization' abstract: 'We consider two federated learning algorithms for training partially personalized models, where the shared and personal parameters are updated either simultaneously or alternately on the devices. Both algorithms have been proposed in the literature, but their convergence properties are not fully understood, especially for the alternating variant. We provide convergence analyses of both algorithms in the general nonconvex setting with partial participation and delineate the regime where one dominates the other. Our experiments on real-world image, text, and speech datasets demonstrate that (a) partial personalization can obtain most of the benefits of full model personalization with a small fraction of personal parameters, and, (b) the alternating update algorithm outperforms the simultaneous update algorithm by a small but consistent margin.' volume: 162 URL: https://proceedings.mlr.press/v162/pillutla22a.html PDF: https://proceedings.mlr.press/v162/pillutla22a/pillutla22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-pillutla22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Krishna family: Pillutla - given: Kshitiz family: Malik - given: Abdel-Rahman family: Mohamed - given: Mike family: Rabbat - given: Maziar family: Sanjabi - given: Lin family: Xiao editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17716-17758 id: pillutla22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17716 lastpage: 17758 published: 2022-06-28 00:00:00 +0000 - title: 'Deep Networks on Toroids: Removing Symmetries Reveals the Structure of Flat Regions in the Landscape Geometry' abstract: 'We systematize the approach to the investigation of deep neural network landscapes by basing it on the geometry of the space of implemented functions rather than the space of parameters. Grouping classifiers into equivalence classes, we develop a standardized parameterization in which all symmetries are removed, resulting in a toroidal topology. On this space, we explore the error landscape rather than the loss. This lets us derive a meaningful notion of the flatness of minimizers and of the geodesic paths connecting them. Using different optimization algorithms that sample minimizers with different flatness we study the mode connectivity and relative distances. Testing a variety of state-of-the-art architectures and benchmark datasets, we confirm the correlation between flatness and generalization performance; we further show that in function space flatter minima are closer to each other and that the barriers along the geodesics connecting them are small. We also find that minimizers found by variants of gradient descent can be connected by zero-error paths composed of two straight lines in parameter space, i.e. polygonal chains with a single bend. We observe similar qualitative results in neural networks with binary weights and activations, providing one of the first results concerning the connectivity in this setting. Our results hinge on symmetry removal, and are in remarkable agreement with the rich phenomenology described by some recent analytical studies performed on simple shallow models.' volume: 162 URL: https://proceedings.mlr.press/v162/pittorino22a.html PDF: https://proceedings.mlr.press/v162/pittorino22a/pittorino22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-pittorino22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Fabrizio family: Pittorino - given: Antonio family: Ferraro - given: Gabriele family: Perugini - given: Christoph family: Feinauer - given: Carlo family: Baldassi - given: Riccardo family: Zecchina editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17759-17781 id: pittorino22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17759 lastpage: 17781 published: 2022-06-28 00:00:00 +0000 - title: 'Geometric Multimodal Contrastive Representation Learning' abstract: 'Learning representations of multimodal data that are both informative and robust to missing modalities at test time remains a challenging problem due to the inherent heterogeneity of data obtained from different channels. To address it, we present a novel Geometric Multimodal Contrastive (GMC) representation learning method consisting of two main components: i) a two-level architecture consisting of modality-specific base encoders, allowing to process an arbitrary number of modalities to an intermediate representation of fixed dimensionality, and a shared projection head, mapping the intermediate representations to a latent representation space; ii) a multimodal contrastive loss function that encourages the geometric alignment of the learned representations. We experimentally demonstrate that GMC representations are semantically rich and achieve state-of-the-art performance with missing modality information on three different learning problems including prediction and reinforcement learning tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/poklukar22a.html PDF: https://proceedings.mlr.press/v162/poklukar22a/poklukar22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-poklukar22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Petra family: Poklukar - given: Miguel family: Vasco - given: Hang family: Yin - given: Francisco S. family: Melo - given: Ana family: Paiva - given: Danica family: Kragic editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17782-17800 id: poklukar22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17782 lastpage: 17800 published: 2022-06-28 00:00:00 +0000 - title: 'Constrained Offline Policy Optimization' abstract: 'In this work we introduce Constrained Offline Policy Optimization (COPO), an offline policy optimization algorithm for learning in MDPs with cost constraints. COPO is built upon a novel offline cost-projection method, which we formally derive and analyze. Our method improves upon the state-of-the-art in offline constrained policy optimization by explicitly accounting for distributional shift and by offering non-asymptotic confidence bounds on the cost of a policy. These formal properties are superior to those of existing techniques, which only guarantee convergence to a point estimate. We formally analyze our method and empirically demonstrate that it achieves state-of-the-art performance on discrete and continuous control problems, while offering the aforementioned improved, stronger, and more robust theoretical guarantees.' volume: 162 URL: https://proceedings.mlr.press/v162/polosky22a.html PDF: https://proceedings.mlr.press/v162/polosky22a/polosky22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-polosky22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Nicholas family: Polosky - given: Bruno C. Da family: Silva - given: Madalina family: Fiterau - given: Jithin family: Jagannath editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17801-17810 id: polosky22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17801 lastpage: 17810 published: 2022-06-28 00:00:00 +0000 - title: 'Offline Meta-Reinforcement Learning with Online Self-Supervision' abstract: 'Meta-reinforcement learning (RL) methods can meta-train policies that adapt to new tasks with orders of magnitude less data than standard RL, but meta-training itself is costly and time-consuming. If we can meta-train on offline data, then we can reuse the same static dataset, labeled once with rewards for different tasks, to meta-train policies that adapt to a variety of new tasks at meta-test time. Although this capability would make meta-RL a practical tool for real-world use, offline meta-RL presents additional challenges beyond online meta-RL or standard offline RL settings. Meta-RL learns an exploration strategy that collects data for adapting, and also meta-trains a policy that quickly adapts to data from a new task. Since this policy was meta-trained on a fixed, offline dataset, it might behave unpredictably when adapting to data collected by the learned exploration strategy, which differs systematically from the offline data and thus induces distributional shift. We propose a hybrid offline meta-RL algorithm, which uses offline data with rewards to meta-train an adaptive policy, and then collects additional unsupervised online data, without any reward labels to bridge this distribution shift. By not requiring reward labels for online collection, this data can be much cheaper to collect. We compare our method to prior work on offline meta-RL on simulated robot locomotion and manipulation tasks and find that using additional unsupervised online data collection leads to a dramatic improvement in the adaptive capabilities of the meta-trained policies, matching the performance of fully online meta-RL on a range of challenging domains that require generalization to new tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/pong22a.html PDF: https://proceedings.mlr.press/v162/pong22a/pong22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-pong22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Vitchyr H family: Pong - given: Ashvin V family: Nair - given: Laura M family: Smith - given: Catherine family: Huang - given: Sergey family: Levine editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17811-17829 id: pong22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17811 lastpage: 17829 published: 2022-06-28 00:00:00 +0000 - title: 'Debiaser Beware: Pitfalls of Centering Regularized Transport Maps' abstract: 'Estimating optimal transport (OT) maps (a.k.a. Monge maps) between two measures P and Q is a problem fraught with computational and statistical challenges. A promising approach lies in using the dual potential functions obtained when solving an entropy-regularized OT problem between samples P_n and Q_n, which can be used to recover an approximately optimal map. The negentropy penalization in that scheme introduces, however, an estimation bias that grows with the regularization strength. A well-known remedy to debias such estimates, which has gained wide popularity among practitioners of regularized OT, is to center them, by subtracting auxiliary problems involving P_n and itself, as well as Q_n and itself. We do prove that, under favorable conditions on P and Q, debiasing can yield better approximations to the Monge map. However, and perhaps surprisingly, we present a few cases in which debiasing is provably detrimental in a statistical sense, notably when the regularization strength is large or the number of samples is small. These claims are validated experimentally on synthetic and real datasets, and should reopen the debate on whether debiasing is needed when using entropic OT.' volume: 162 URL: https://proceedings.mlr.press/v162/pooladian22a.html PDF: https://proceedings.mlr.press/v162/pooladian22a/pooladian22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-pooladian22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Aram-Alexandre family: Pooladian - given: Marco family: Cuturi - given: Jonathan family: Niles-Weed editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17830-17847 id: pooladian22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17830 lastpage: 17847 published: 2022-06-28 00:00:00 +0000 - title: 'Adaptive Second Order Coresets for Data-efficient Machine Learning' abstract: 'Training machine learning models on massive datasets incurs substantial computational costs. To alleviate such costs, there has been a sustained effort to develop data-efficient training methods that can carefully select subsets of the training examples that generalize on par with the full training data. However, existing methods are limited in providing theoretical guarantees for the quality of the models trained on the extracted subsets, and may perform poorly in practice. We propose AdaCore, a method that leverages the geometry of the data to extract subsets of the training examples for efficient machine learning. The key idea behind our method is to dynamically approximate the curvature of the loss function via an exponentially-averaged estimate of the Hessian to select weighted subsets (coresets) that provide a close approximation of the full gradient preconditioned with the Hessian. We prove rigorous guarantees for the convergence of various first and second-order methods applied to the subsets chosen by AdaCore. Our extensive experiments show that AdaCore extracts coresets with higher quality compared to baselines and speeds up training of convex and non-convex machine learning models, such as logistic regression and neural networks, by over 2.9x over the full data and 4.5x over random subsets.' volume: 162 URL: https://proceedings.mlr.press/v162/pooladzandi22a.html PDF: https://proceedings.mlr.press/v162/pooladzandi22a/pooladzandi22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-pooladzandi22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Omead family: Pooladzandi - given: David family: Davini - given: Baharan family: Mirzasoleiman editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17848-17869 id: pooladzandi22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17848 lastpage: 17869 published: 2022-06-28 00:00:00 +0000 - title: 'On the Practicality of Deterministic Epistemic Uncertainty' abstract: 'A set of novel approaches for estimating epistemic uncertainty in deep neural networks with a single forward pass has recently emerged as a valid alternative to Bayesian Neural Networks. On the premise of informative representations, these deterministic uncertainty methods (DUMs) achieve strong performance on detecting out-of-distribution (OOD) data while adding negligible computational costs at inference time. However, it remains unclear whether DUMs are well calibrated and can seamlessly scale to real-world applications - both prerequisites for their practical deployment. To this end, we first provide a taxonomy of DUMs, and evaluate their calibration under continuous distributional shifts. Then, we extend them to semantic segmentation. We find that, while DUMs scale to realistic vision tasks and perform well on OOD detection, the practicality of current methods is undermined by poor calibration under distributional shifts.' volume: 162 URL: https://proceedings.mlr.press/v162/postels22a.html PDF: https://proceedings.mlr.press/v162/postels22a/postels22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-postels22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Janis family: Postels - given: Mattia family: Segù - given: Tao family: Sun - given: Luca Daniel family: Sieber - given: Luc family: Van Gool - given: Fisher family: Yu - given: Federico family: Tombari editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17870-17909 id: postels22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17870 lastpage: 17909 published: 2022-06-28 00:00:00 +0000 - title: 'A Simple Guard for Learned Optimizers' abstract: 'If the trend of learned components eventually outperforming their hand-crafted version continues, learned optimizers will eventually outperform hand-crafted optimizers like SGD or Adam. Even if learned optimizers (L2Os) eventually outpace hand-crafted ones in practice however, they are still not provably convergent and might fail out of distribution. These are the questions addressed here. Currently, learned optimizers frequently outperform generic hand-crafted optimizers (such as gradient descent) at the beginning of learning but they generally plateau after some time while the generic algorithms continue to make progress and often overtake the learned algorithm as Aesop’s tortoise which overtakes the hare. L2Os also still have a difficult time generalizing out of distribution. \cite{heaton_safeguarded_2020} proposed Safeguarded L2O (GL2O) which can take a learned optimizer and safeguard it with a generic learning algorithm so that by conditionally switching between the two, the resulting algorithm is provably convergent. We propose a new class of Safeguarded L2O, called Loss-Guarded L2O (LGL2O), which is both conceptually simpler and computationally less expensive. The guarding mechanism decides solely based on the expected future loss value of both optimizers. Furthermore, we show theoretical proof of LGL2O’s convergence guarantee and empirical results comparing to GL2O and other baselines showing that it combines the best of both L2O and SGD and that in practice converges much better than GL2O.' volume: 162 URL: https://proceedings.mlr.press/v162/premont-schwarz22a.html PDF: https://proceedings.mlr.press/v162/premont-schwarz22a/premont-schwarz22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-premont-schwarz22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Isabeau family: Prémont-Schwarz - given: Jaroslav family: Vı́tků - given: Jan family: Feyereisl editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17910-17925 id: premont-schwarz22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17910 lastpage: 17925 published: 2022-06-28 00:00:00 +0000 - title: 'Hardness and Algorithms for Robust and Sparse Optimization' abstract: 'We explore algorithms and limitations for sparse optimization problems such as sparse linear regression and robust linear regression. The goal of the sparse linear regression problem is to identify a small number of key features, while the goal of the robust linear regression problem is to identify a small number of erroneous measurements. Specifically, the sparse linear regression problem seeks a $k$-sparse vector $x\in\mathbb{R}^d$ to minimize $\|Ax-b\|_2$, given an input matrix $A\in\mathbb{R}^{n\times d}$ and a target vector $b\in\mathbb{R}^n$, while the robust linear regression problem seeks a set $S$ that ignores at most $k$ rows and a vector $x$ to minimize $\|(Ax-b)_S\|_2$. We first show bicriteria, NP-hardness of approximation for robust regression building on the work of \cite{ODonnellWZ15} which implies a similar result for sparse regression. We further show fine-grained hardness of robust regression through a reduction from the minimum-weight $k$-clique conjecture. On the positive side, we give an algorithm for robust regression that achieves arbitrarily accurate additive error and uses runtime that closely matches the lower bound from the fine-grained hardness result, as well as an algorithm for sparse regression with similar runtime. Both our upper and lower bounds rely on a general reduction from robust linear regression to sparse regression that we introduce. Our algorithms, inspired by the 3SUM problem, use approximate nearest neighbor data structures and may be of independent interest for solving sparse optimization problems. For instance, we demonstrate that our techniques can also be used for the well-studied sparse PCA problem.' volume: 162 URL: https://proceedings.mlr.press/v162/price22a.html PDF: https://proceedings.mlr.press/v162/price22a/price22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-price22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Eric family: Price - given: Sandeep family: Silwal - given: Samson family: Zhou editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17926-17944 id: price22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17926 lastpage: 17944 published: 2022-06-28 00:00:00 +0000 - title: 'Nonlinear Feature Diffusion on Hypergraphs' abstract: 'Hypergraphs are a common model for multiway relationships in data, and hypergraph semi-supervised learning is the problem of assigning labels to all nodes in a hypergraph, given labels on just a few nodes. Diffusions and label spreading are classical techniques for semi-supervised learning in the graph setting, and there are some standard ways to extend them to hypergraphs. However, these methods are linear models, and do not offer an obvious way of incorporating node features for making predictions. Here, we develop a nonlinear diffusion process on hypergraphs that spreads both features and labels following the hypergraph structure. Even though the process is nonlinear, we show global convergence to a unique limiting point for a broad class of nonlinearities and we show that such limit is the global minimum of a new regularized semi-supervised learning loss function which aims at reducing a generalized form of variance of the nodes across the hyperedges. The limiting point serves as a node embedding from which we make predictions with a linear model. Our approach is competitive with state-of-the-art graph and hypergraph neural networks, and also takes less time to train.' volume: 162 URL: https://proceedings.mlr.press/v162/prokopchik22a.html PDF: https://proceedings.mlr.press/v162/prokopchik22a/prokopchik22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-prokopchik22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Konstantin family: Prokopchik - given: Austin R family: Benson - given: Francesco family: Tudisco editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17945-17958 id: prokopchik22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17945 lastpage: 17958 published: 2022-06-28 00:00:00 +0000 - title: 'Universal Joint Approximation of Manifolds and Densities by Simple Injective Flows' abstract: 'We study approximation of probability measures supported on n-dimensional manifolds embedded in R^m by injective flows—neural networks composed of invertible flows and injective layers. We show that in general, injective flows between R^n and R^m universally approximate measures supported on images of extendable embeddings, which are a subset of standard embeddings: when the embedding dimension m is small, topological obstructions may preclude certain manifolds as admissible targets. When the embedding dimension is sufficiently large, m >= 3n+1, we use an argument from algebraic topology known as the clean trick to prove that the topological obstructions vanish and injective flows universally approximate any differentiable embedding. Along the way we show that the studied injective flows admit efficient projections on the range, and that their optimality can be established "in reverse," resolving a conjecture made in Brehmer & Cranmer 2020.' volume: 162 URL: https://proceedings.mlr.press/v162/puthawala22a.html PDF: https://proceedings.mlr.press/v162/puthawala22a/puthawala22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-puthawala22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Michael family: Puthawala - given: Matti family: Lassas - given: Ivan family: Dokmanic - given: Maarten family: De Hoop editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17959-17983 id: puthawala22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17959 lastpage: 17983 published: 2022-06-28 00:00:00 +0000 - title: 'The Teaching Dimension of Regularized Kernel Learners' abstract: 'Teaching dimension (TD) is a fundamental theoretical property for understanding machine teaching algorithms. It measures the sample complexity of teaching a target hypothesis to a learner. The TD of linear learners has been studied extensively, whereas the results of teaching non-linear learners are rare. A recent result investigates the TD of polynomial and Gaussian kernel learners. Unfortunately, the theoretical bounds therein show that the TD is high when teaching those non-linear learners. Inspired by the fact that regularization can reduce the learning complexity in machine learning, a natural question is whether the similar fact happens in machine teaching. To answer this essential question, this paper proposes a unified theoretical framework termed STARKE to analyze the TD of regularized kernel learners. On the basis of STARKE, we derive a generic result of any type of kernels. Furthermore, we disclose that the TD of regularized linear and regularized polynomial kernel learners can be strictly reduced. For regularized Gaussian kernel learners, we reveal that, although their TD is infinite, their epsilon-approximate TD can be exponentially reduced compared with that of the unregularized learners. The extensive experimental results of teaching the optimization-based learners verify the theoretical findings.' volume: 162 URL: https://proceedings.mlr.press/v162/qian22a.html PDF: https://proceedings.mlr.press/v162/qian22a/qian22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-qian22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hong family: Qian - given: Xu-Hui family: Liu - given: Chen-Xi family: Su - given: Aimin family: Zhou - given: Yang family: Yu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 17984-18002 id: qian22a issued: date-parts: - 2022 - 6 - 28 firstpage: 17984 lastpage: 18002 published: 2022-06-28 00:00:00 +0000 - title: 'ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers' abstract: 'Self-supervised learning in speech involves training a speech representation network on a large-scale unannotated speech corpus, and then applying the learned representations to downstream tasks. Since the majority of the downstream tasks of SSL learning in speech largely focus on the content information in speech, the most desirable speech representations should be able to disentangle unwanted variations, such as speaker variations, from the content. However, disentangling speakers is very challenging, because removing the speaker information could easily result in a loss of content as well, and the damage of the latter usually far outweighs the benefit of the former. In this paper, we propose a new SSL method that can achieve speaker disentanglement without severe loss of content. Our approach is adapted from the HuBERT framework, and incorporates disentangling mechanisms to regularize both the teacher labels and the learned representations. We evaluate the benefit of speaker disentanglement on a set of content-related downstream tasks, and observe a consistent and notable performance advantage of our speaker-disentangled representations.' volume: 162 URL: https://proceedings.mlr.press/v162/qian22b.html PDF: https://proceedings.mlr.press/v162/qian22b/qian22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-qian22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Kaizhi family: Qian - given: Yang family: Zhang - given: Heting family: Gao - given: Junrui family: Ni - given: Cheng-I family: Lai - given: David family: Cox - given: Mark family: Hasegawa-Johnson - given: Shiyu family: Chang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18003-18017 id: qian22b issued: date-parts: - 2022 - 6 - 28 firstpage: 18003 lastpage: 18017 published: 2022-06-28 00:00:00 +0000 - title: 'Interventional Contrastive Learning with Meta Semantic Regularizer' abstract: 'Contrastive learning (CL)-based self-supervised learning models learn visual representations in a pairwise manner. Although the prevailing CL model has achieved great progress, in this paper, we uncover an ever-overlooked phenomenon: When the CL model is trained with full images, the performance tested in full images is better than that in foreground areas; when the CL model is trained with foreground areas, the performance tested in full images is worse than that in foreground areas. This observation reveals that backgrounds in images may interfere with the model learning semantic information and their influence has not been fully eliminated. To tackle this issue, we build a Structural Causal Model (SCM) to model the background as a confounder. We propose a backdoor adjustment-based regularization method, namely Interventional Contrastive Learning with Meta Semantic Regularizer (ICL-MSR), to perform causal intervention towards the proposed SCM. ICL-MSR can be incorporated into any existing CL methods to alleviate background distractions from representation learning. Theoretically, we prove that ICL-MSR achieves a tighter error bound. Empirically, our experiments on multiple benchmark datasets demonstrate that ICL-MSR is able to improve the performances of different state-of-the-art CL methods.' volume: 162 URL: https://proceedings.mlr.press/v162/qiang22a.html PDF: https://proceedings.mlr.press/v162/qiang22a/qiang22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-qiang22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Wenwen family: Qiang - given: Jiangmeng family: Li - given: Changwen family: Zheng - given: Bing family: Su - given: Hui family: Xiong editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18018-18030 id: qiang22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18018 lastpage: 18030 published: 2022-06-28 00:00:00 +0000 - title: 'Sample-Efficient Reinforcement Learning with loglog(T) Switching Cost' abstract: 'We study the problem of reinforcement learning (RL) with low (policy) switching cost {—} a problem well-motivated by real-life RL applications in which deployments of new policies are costly and the number of policy updates must be low. In this paper, we propose a new algorithm based on stage-wise exploration and adaptive policy elimination that achieves a regret of $\widetilde{O}(\sqrt{H^4S^2AT})$ while requiring a switching cost of $O(HSA \log\log T)$. This is an exponential improvement over the best-known switching cost $O(H^2SA\log T)$ among existing methods with $\widetilde{O}(\mathrm{poly}(H,S,A)\sqrt{T})$ regret. In the above, $S,A$ denotes the number of states and actions in an $H$-horizon episodic Markov Decision Process model with unknown transitions, and $T$ is the number of steps. As a byproduct of our new techniques, we also derive a reward-free exploration algorithm with a switching cost of $O(HSA)$. Furthermore, we prove a pair of information-theoretical lower bounds which say that (1) Any no-regret algorithm must have a switching cost of $\Omega(HSA)$; (2) Any $\widetilde{O}(\sqrt{T})$ regret algorithm must incur a switching cost of $\Omega(HSA\log\log T)$. Both our algorithms are thus optimal in their switching costs.' volume: 162 URL: https://proceedings.mlr.press/v162/qiao22a.html PDF: https://proceedings.mlr.press/v162/qiao22a/qiao22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-qiao22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Dan family: Qiao - given: Ming family: Yin - given: Ming family: Min - given: Yu-Xiang family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18031-18061 id: qiao22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18031 lastpage: 18061 published: 2022-06-28 00:00:00 +0000 - title: 'Generalizing to Evolving Domains with Latent Structure-Aware Sequential Autoencoder' abstract: 'Domain generalization aims to improve the generalization capability of machine learning systems to out-of-distribution (OOD) data. Existing domain generalization techniques embark upon stationary and discrete environments to tackle the generalization issue caused by OOD data. However, many real-world tasks in non-stationary environments (e.g., self-driven car system, sensor measures) involve more complex and continuously evolving domain drift, which raises new challenges for the problem of domain generalization. In this paper, we formulate the aforementioned setting as the problem of evolving domain generalization. Specifically, we propose to introduce a probabilistic framework called Latent Structure-aware Sequential Autoencoder (LSSAE) to tackle the problem of evolving domain generalization via exploring the underlying continuous structure in the latent space of deep neural networks, where we aim to identify two major factors namely covariate shift and concept shift accounting for distribution shift in non-stationary environments. Experimental results on both synthetic and real-world datasets show that LSSAE can lead to superior performances based on the evolving domain generalization setting.' volume: 162 URL: https://proceedings.mlr.press/v162/qin22a.html PDF: https://proceedings.mlr.press/v162/qin22a/qin22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-qin22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tiexin family: Qin - given: Shiqi family: Wang - given: Haoliang family: Li editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18062-18082 id: qin22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18062 lastpage: 18082 published: 2022-06-28 00:00:00 +0000 - title: 'Graph Neural Architecture Search Under Distribution Shifts' abstract: 'Graph neural architecture search has shown great potentials for automatically designing graph neural network (GNN) architectures for graph classification tasks. However, when there is a distribution shift between training and testing graphs, the existing approaches fail to deal with the problem of adapting to unknown test graph structures since they only search for a fixed architecture for all graphs. To solve this problem, we propose a novel GRACES model which is able to generalize under distribution shifts through tailoring a customized GNN architecture suitable for each graph instance with unknown distribution. Specifically, we design a self-supervised disentangled graph encoder to characterize invariant factors hidden in diverse graph structures. Then, we propose a prototype-based architecture customization strategy to generate the most suitable GNN architecture weights in a continuous space for each graph instance. We further propose a customized super-network to share weights among different architectures for the sake of efficient training. Extensive experiments on both synthetic and real-world datasets demonstrate that our proposed GRACES model can adapt to diverse graph structures and achieve state-of-the-art performance for graph classification tasks under distribution shifts.' volume: 162 URL: https://proceedings.mlr.press/v162/qin22b.html PDF: https://proceedings.mlr.press/v162/qin22b/qin22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-qin22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yijian family: Qin - given: Xin family: Wang - given: Ziwei family: Zhang - given: Pengtao family: Xie - given: Wenwu family: Zhu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18083-18095 id: qin22b issued: date-parts: - 2022 - 6 - 28 firstpage: 18083 lastpage: 18095 published: 2022-06-28 00:00:00 +0000 - title: 'Spectral Representation of Robustness Measures for Optimization Under Input Uncertainty' abstract: 'We study the inference of mean-variance robustness measures to quantify input uncertainty under the Gaussian Process (GP) framework. These measures are widely used in applications where the robustness of the solution is of interest, for example, in engineering design. While the variance is commonly used to characterize the robustness, Bayesian inference of the variance using GPs is known to be challenging. In this paper, we propose a Spectral Representation of Robustness Measures based on the GP’s spectral representation, i.e., an analytical approach to approximately infer both robustness measures for normal and uniform input uncertainty distributions. We present two approximations based on different Fourier features and compare their accuracy numerically. To demonstrate their utility and efficacy in robust Bayesian Optimization, we integrate the analytical robustness measures in three standard acquisition functions for various robust optimization formulations. We show their competitive performance on numerical benchmarks and real-life applications.' volume: 162 URL: https://proceedings.mlr.press/v162/qing22a.html PDF: https://proceedings.mlr.press/v162/qing22a/qing22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-qing22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jixiang family: Qing - given: Tom family: Dhaene - given: Ivo family: Couckuyt editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18096-18121 id: qing22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18096 lastpage: 18121 published: 2022-06-28 00:00:00 +0000 - title: 'Large-scale Stochastic Optimization of NDCG Surrogates for Deep Learning with Provable Convergence' abstract: 'NDCG, namely Normalized Discounted Cumulative Gain, is a widely used ranking metric in information retrieval and machine learning. However, efficient and provable stochastic methods for maximizing NDCG are still lacking, especially for deep models. In this paper, we propose a principled approach to optimize NDCG and its top-$K$ variant. First, we formulate a novel compositional optimization problem for optimizing the NDCG surrogate, and a novel bilevel compositional optimization problem for optimizing the top-$K$ NDCG surrogate. Then, we develop efficient stochastic algorithms with provable convergence guarantees for the non-convex objectives. Different from existing NDCG optimization methods, the per-iteration complexity of our algorithms scales with the mini-batch size instead of the number of total items. To improve the effectiveness for deep learning, we further propose practical strategies by using initial warm-up and stop gradient operator. Experimental results on multiple datasets demonstrate that our methods outperform prior ranking approaches in terms of NDCG. To the best of our knowledge, this is the first time that stochastic algorithms are proposed to optimize NDCG with a provable convergence guarantee. Our proposed methods are implemented in the LibAUC library at https://libauc.org.' volume: 162 URL: https://proceedings.mlr.press/v162/qiu22a.html PDF: https://proceedings.mlr.press/v162/qiu22a/qiu22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-qiu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zi-Hao family: Qiu - given: Quanqi family: Hu - given: Yongjian family: Zhong - given: Lijun family: Zhang - given: Tianbao family: Yang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18122-18152 id: qiu22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18122 lastpage: 18152 published: 2022-06-28 00:00:00 +0000 - title: 'Latent Outlier Exposure for Anomaly Detection with Contaminated Data' abstract: 'Anomaly detection aims at identifying data points that show systematic deviations from the majority of data in an unlabeled dataset. A common assumption is that clean training data (free of anomalies) is available, which is often violated in practice. We propose a strategy for training an anomaly detector in the presence of unlabeled anomalies that is compatible with a broad class of models. The idea is to jointly infer binary labels to each datum (normal vs. anomalous) while updating the model parameters. Inspired by outlier exposure (Hendrycks et al., 2018) that considers synthetically created, labeled anomalies, we thereby use a combination of two losses that share parameters: one for the normal and one for the anomalous data. We then iteratively proceed with block coordinate updates on the parameters and the most likely (latent) labels. Our experiments with several backbone models on three image datasets, 30 tabular data sets, and a video anomaly detection benchmark showed consistent and significant improvements over the baselines.' volume: 162 URL: https://proceedings.mlr.press/v162/qiu22b.html PDF: https://proceedings.mlr.press/v162/qiu22b/qiu22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-qiu22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Chen family: Qiu - given: Aodong family: Li - given: Marius family: Kloft - given: Maja family: Rudolph - given: Stephan family: Mandt editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18153-18167 id: qiu22b issued: date-parts: - 2022 - 6 - 28 firstpage: 18153 lastpage: 18167 published: 2022-06-28 00:00:00 +0000 - title: 'Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning' abstract: 'In view of its power in extracting feature representation, contrastive self-supervised learning has been successfully integrated into the practice of (deep) reinforcement learning (RL), leading to efficient policy learning on various applications. Despite its tremendous empirical successes, the understanding of contrastive learning for RL remains elusive. To narrow such a gap, we study contrastive-learning empowered RL for a class of Markov decision processes (MDPs) and Markov games (MGs) with low-rank transitions. For both models, we propose to extract the correct feature representations of the low-rank model by minimizing a contrastive loss. Moreover, under the online setting, we propose novel upper confidence bound (UCB)-type algorithms that incorporate such a contrastive loss with online RL algorithms for MDPs or MGs. We further theoretically prove that our algorithm recovers the true representations and simultaneously achieves sample efficiency in learning the optimal policy and Nash equilibrium in MDPs and MGs. We also provide empirical studies to demonstrate the efficacy of the UCB-based contrastive learning method for RL. To the best of our knowledge, we provide the first provably efficient online RL algorithm that incorporates contrastive learning for representation learning.' volume: 162 URL: https://proceedings.mlr.press/v162/qiu22c.html PDF: https://proceedings.mlr.press/v162/qiu22c/qiu22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-qiu22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Shuang family: Qiu - given: Lingxiao family: Wang - given: Chenjia family: Bai - given: Zhuoran family: Yang - given: Zhaoran family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18168-18210 id: qiu22c issued: date-parts: - 2022 - 6 - 28 firstpage: 18168 lastpage: 18210 published: 2022-06-28 00:00:00 +0000 - title: 'Fast and Provable Nonconvex Tensor RPCA' abstract: 'In this paper, we study nonconvex tensor robust principal component analysis (RPCA) based on the $t$-SVD. We first propose an alternating projection method, i.e., APT, which converges linearly to the ground-truth under the incoherence conditions of tensors. However, as the projection to the low-rank tensor space in APT can be slow, we further propose to speedup such a process by utilizing the property of the tangent space of low-rank. The resulting algorithm, i.e., EAPT, is not only more efficient than APT but also keeps the linear convergence. Compared with existing tensor RPCA works, the proposed method, especially EAPT, is not only more effective due to the recovery guarantee and adaption in the transformed (frequency) domain but also more efficient due to faster convergence rate and lower iteration complexity. These benefits are also empirically verified both on synthetic data, and real applications, e.g., hyperspectral image denoising and video background subtraction.' volume: 162 URL: https://proceedings.mlr.press/v162/qiu22d.html PDF: https://proceedings.mlr.press/v162/qiu22d/qiu22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-qiu22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Haiquan family: Qiu - given: Yao family: Wang - given: Shaojie family: Tang - given: Deyu family: Meng - given: Quanming family: Yao editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18211-18249 id: qiu22d issued: date-parts: - 2022 - 6 - 28 firstpage: 18211 lastpage: 18249 published: 2022-06-28 00:00:00 +0000 - title: 'Generalized Federated Learning via Sharpness Aware Minimization' abstract: 'Federated Learning (FL) is a promising framework for performing privacy-preserving, distributed learning with a set of clients. However, the data distribution among clients often exhibits non-IID, i.e., distribution shift, which makes efficient optimization difficult. To tackle this problem, many FL algorithms focus on mitigating the effects of data heterogeneity across clients by increasing the performance of the global model. However, almost all algorithms leverage Empirical Risk Minimization (ERM) to be the local optimizer, which is easy to make the global model fall into a sharp valley and increase a large deviation of parts of local clients. Therefore, in this paper, we revisit the solutions to the distribution shift problem in FL with a focus on local learning generality. To this end, we propose a general, effective algorithm, \texttt{FedSAM}, based on Sharpness Aware Minimization (SAM) local optimizer, and develop a momentum FL algorithm to bridge local and global models, \texttt{MoFedSAM}. Theoretically, we show the convergence analysis of these two algorithms and demonstrate the generalization bound of \texttt{FedSAM}. Empirically, our proposed algorithms substantially outperform existing FL studies and significantly decrease the learning deviation.' volume: 162 URL: https://proceedings.mlr.press/v162/qu22a.html PDF: https://proceedings.mlr.press/v162/qu22a/qu22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-qu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhe family: Qu - given: Xingyu family: Li - given: Rui family: Duan - given: Yao family: Liu - given: Bo family: Tang - given: Zhuo family: Lu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18250-18280 id: qu22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18250 lastpage: 18280 published: 2022-06-28 00:00:00 +0000 - title: 'Particle Transformer for Jet Tagging' abstract: 'Jet tagging is a critical yet challenging classification task in particle physics. While deep learning has transformed jet tagging and significantly improved performance, the lack of a large-scale public dataset impedes further enhancement. In this work, we present JetClass, a new comprehensive dataset for jet tagging. The JetClass dataset consists of 100 M jets, about two orders of magnitude larger than existing public datasets. A total of 10 types of jets are simulated, including several types unexplored for tagging so far. Based on the large dataset, we propose a new Transformer-based architecture for jet tagging, called Particle Transformer (ParT). By incorporating pairwise particle interactions in the attention mechanism, ParT achieves higher tagging performance than a plain Transformer and surpasses the previous state-of-the-art, ParticleNet, by a large margin. The pre-trained ParT models, once fine-tuned, also substantially enhance the performance on two widely adopted jet tagging benchmarks. The dataset, code and models are publicly available at https://github.com/jet-universe/particle_transformer.' volume: 162 URL: https://proceedings.mlr.press/v162/qu22b.html PDF: https://proceedings.mlr.press/v162/qu22b/qu22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-qu22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Huilin family: Qu - given: Congqiao family: Li - given: Sitian family: Qian editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18281-18292 id: qu22b issued: date-parts: - 2022 - 6 - 28 firstpage: 18281 lastpage: 18292 published: 2022-06-28 00:00:00 +0000 - title: 'Winning the Lottery Ahead of Time: Efficient Early Network Pruning' abstract: 'Pruning, the task of sparsifying deep neural networks, received increasing attention recently. Although state-of-the-art pruning methods extract highly sparse models, they neglect two main challenges: (1) the process of finding these sparse models is often very expensive; (2) unstructured pruning does not provide benefits in terms of GPU memory, training time, or carbon emissions. We propose Early Compression via Gradient Flow Preservation (EarlyCroP), which efficiently extracts state-of-the-art sparse models before or early in training addressing challenge (1), and can be applied in a structured manner addressing challenge (2). This enables us to train sparse networks on commodity GPUs whose dense versions would be too large, thereby saving costs and reducing hardware requirements. We empirically show that EarlyCroP outperforms a rich set of baselines for many tasks (incl. classification, regression) and domains (incl. computer vision, natural language processing, and reinforcment learning). EarlyCroP leads to accuracy comparable to dense training while outperforming pruning baselines.' volume: 162 URL: https://proceedings.mlr.press/v162/rachwan22a.html PDF: https://proceedings.mlr.press/v162/rachwan22a/rachwan22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-rachwan22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: John family: Rachwan - given: Daniel family: Zügner - given: Bertrand family: Charpentier - given: Simon family: Geisler - given: Morgane family: Ayle - given: Stephan family: Günnemann editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18293-18309 id: rachwan22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18293 lastpage: 18309 published: 2022-06-28 00:00:00 +0000 - title: 'Convergence of Uncertainty Sampling for Active Learning' abstract: 'Uncertainty sampling in active learning is heavily used in practice to reduce the annotation cost. However, there has been no wide consensus on the function to be used for uncertainty estimation in binary classification tasks and convergence guarantees of the corresponding active learning algorithms are not well understood. The situation is even more challenging for multi-category classification. In this work, we propose an efficient uncertainty estimator for binary classification which we also extend to multiple classes, and provide a non-asymptotic rate of convergence for our uncertainty sampling based active learning algorithm in both cases under no-noise conditions (i.e., linearly separable data). We also extend our analysis to the noisy case and provide theoretical guarantees for our algorithm under the influence of noise in the task of binary and multi-class classification.' volume: 162 URL: https://proceedings.mlr.press/v162/raj22a.html PDF: https://proceedings.mlr.press/v162/raj22a/raj22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-raj22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Anant family: Raj - given: Francis family: Bach editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18310-18331 id: raj22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18310 lastpage: 18331 published: 2022-06-28 00:00:00 +0000 - title: 'DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale' abstract: 'As the training of giant dense models hits the boundary on the availability and capability of the hardware resources today, Mixture-of-Experts (MoE) models have become one of the most promising model architectures due to their significant training cost reduction compared to quality-equivalent dense models. Their training cost saving is demonstrated from encoder-decoder models (prior works) to a 5x saving for auto-aggressive language models (this work). However, due to the much larger model size and unique architecture, how to provide fast MoE model inference remains challenging and unsolved, limiting their practical usage. To tackle this, we present DeepSpeed-MoE, an end-to-end MoE training and inference solution, including novel MoE architecture designs and model compression techniques that reduce MoE model size by up to 3.7x, and a highly optimized inference system that provides 7.3x better latency and cost compared to existing MoE inference solutions. DeepSpeed-MoE offers an unprecedented scale and efficiency to serve massive MoE models with up to 4.5x faster and 9x cheaper inference compared to quality-equivalent dense models. We hope our innovations and systems help open a promising path to new directions in the large model landscape, a shift from dense to sparse MoE models, where training and deploying higher-quality models with fewer resources becomes more widely possible.' volume: 162 URL: https://proceedings.mlr.press/v162/rajbhandari22a.html PDF: https://proceedings.mlr.press/v162/rajbhandari22a/rajbhandari22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-rajbhandari22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Samyam family: Rajbhandari - given: Conglong family: Li - given: Zhewei family: Yao - given: Minjia family: Zhang - given: Reza Yazdani family: Aminabadi - given: Ammar Ahmad family: Awan - given: Jeff family: Rasley - given: Yuxiong family: He editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18332-18346 id: rajbhandari22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18332 lastpage: 18346 published: 2022-06-28 00:00:00 +0000 - title: 'Fishr: Invariant Gradient Variances for Out-of-Distribution Generalization' abstract: 'Learning robust models that generalize well under changes in the data distribution is critical for real-world applications. To this end, there has been a growing surge of interest to learn simultaneously from multiple training domains - while enforcing different types of invariance across those domains. Yet, all existing approaches fail to show systematic benefits under controlled evaluation protocols. In this paper, we introduce a new regularization - named Fishr - that enforces domain invariance in the space of the gradients of the loss: specifically, the domain-level variances of gradients are matched across training domains. Our approach is based on the close relations between the gradient covariance, the Fisher Information and the Hessian of the loss: in particular, we show that Fishr eventually aligns the domain-level loss landscapes locally around the final weights. Extensive experiments demonstrate the effectiveness of Fishr for out-of-distribution generalization. Notably, Fishr improves the state of the art on the DomainBed benchmark and performs consistently better than Empirical Risk Minimization. Our code is available at https://github.com/alexrame/fishr.' volume: 162 URL: https://proceedings.mlr.press/v162/rame22a.html PDF: https://proceedings.mlr.press/v162/rame22a/rame22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-rame22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Alexandre family: Rame - given: Corentin family: Dancette - given: Matthieu family: Cord editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18347-18377 id: rame22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18347 lastpage: 18377 published: 2022-06-28 00:00:00 +0000 - title: 'A Closer Look at Smoothness in Domain Adversarial Training' abstract: 'Domain adversarial training has been ubiquitous for achieving invariant representations and is used widely for various domain adaptation tasks. In recent times, methods converging to smooth optima have shown improved generalization for supervised learning tasks like classification. In this work, we analyze the effect of smoothness enhancing formulations on domain adversarial training, the objective of which is a combination of task loss (eg. classification, regression etc.) and adversarial terms. We find that converging to a smooth minima with respect to (w.r.t.) task loss stabilizes the adversarial training leading to better performance on target domain. In contrast to task loss, our analysis shows that converging to smooth minima w.r.t. adversarial loss leads to sub-optimal generalization on the target domain. Based on the analysis, we introduce the Smooth Domain Adversarial Training (SDAT) procedure, which effectively enhances the performance of existing domain adversarial methods for both classification and object detection tasks. Our analysis also provides insight into the extensive usage of SGD over Adam in the community for domain adversarial training.' volume: 162 URL: https://proceedings.mlr.press/v162/rangwani22a.html PDF: https://proceedings.mlr.press/v162/rangwani22a/rangwani22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-rangwani22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Harsh family: Rangwani - given: Sumukh K family: Aithal - given: Mayank family: Mishra - given: Arihant family: Jain - given: Venkatesh Babu family: Radhakrishnan editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18378-18399 id: rangwani22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18378 lastpage: 18399 published: 2022-06-28 00:00:00 +0000 - title: 'Linear Adversarial Concept Erasure' abstract: 'Modern neural models trained on textual data rely on pre-trained representations that emerge without direct supervision. As these representations are increasingly being used in real-world applications, the inability to control their content becomes an increasingly important problem. In this work, we formulate the problem of identifying a linear subspace that corresponds to a given concept, and removing it from the representation. We formulate this problem as a constrained, linear minimax game, and show that existing solutions are generally not optimal for this task. We derive a closed-form solution for certain objectives, and propose a convex relaxation that works well for others. When evaluated in the context of binary gender removal, the method recovers a low-dimensional subspace whose removal mitigates bias by intrinsic and extrinsic evaluation. Surprisingly, we show that the method—despite being linear—is highly expressive, effectively mitigating bias in the output layers of deep, nonlinear classifiers while maintaining tractability and interpretability.' volume: 162 URL: https://proceedings.mlr.press/v162/ravfogel22a.html PDF: https://proceedings.mlr.press/v162/ravfogel22a/ravfogel22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ravfogel22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Shauli family: Ravfogel - given: Michael family: Twiton - given: Yoav family: Goldberg - given: Ryan D family: Cotterell editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18400-18421 id: ravfogel22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18400 lastpage: 18421 published: 2022-06-28 00:00:00 +0000 - title: 'Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks' abstract: 'In the pursuit of explaining implicit regularization in deep learning, prominent focus was given to matrix and tensor factorizations, which correspond to simplified neural networks. It was shown that these models exhibit an implicit tendency towards low matrix and tensor ranks, respectively. Drawing closer to practical deep learning, the current paper theoretically analyzes the implicit regularization in hierarchical tensor factorization, a model equivalent to certain deep convolutional neural networks. Through a dynamical systems lens, we overcome challenges associated with hierarchy, and establish implicit regularization towards low hierarchical tensor rank. This translates to an implicit regularization towards locality for the associated convolutional networks. Inspired by our theory, we design explicit regularization discouraging locality, and demonstrate its ability to improve the performance of modern convolutional networks on non-local tasks, in defiance of conventional wisdom by which architectural changes are needed. Our work highlights the potential of enhancing neural networks via theoretical analysis of their implicit regularization.' volume: 162 URL: https://proceedings.mlr.press/v162/razin22a.html PDF: https://proceedings.mlr.press/v162/razin22a/razin22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-razin22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Noam family: Razin - given: Asaf family: Maman - given: Nadav family: Cohen editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18422-18462 id: razin22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18422 lastpage: 18462 published: 2022-06-28 00:00:00 +0000 - title: 'One-Pass Algorithms for MAP Inference of Nonsymmetric Determinantal Point Processes' abstract: 'In this paper, we initiate the study of one-pass algorithms for solving the maximum-a-posteriori (MAP) inference problem for Non-symmetric Determinantal Point Processes (NDPPs). In particular, we formulate streaming and online versions of the problem and provide one-pass algorithms for solving these problems. In our streaming setting, data points arrive in an arbitrary order and the algorithms are constrained to use a single-pass over the data as well as sub-linear memory, and only need to output a valid solution at the end of the stream. Our online setting has an additional requirement of maintaining a valid solution at any point in time. We design new one-pass algorithms for these problems and show that they perform comparably to (or even better than) the offline greedy algorithm while using substantially lower memory.' volume: 162 URL: https://proceedings.mlr.press/v162/reddy22a.html PDF: https://proceedings.mlr.press/v162/reddy22a/reddy22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-reddy22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Aravind family: Reddy - given: Ryan A. family: Rossi - given: Zhao family: Song - given: Anup family: Rao - given: Tung family: Mai - given: Nedim family: Lipka - given: Gang family: Wu - given: Eunyee family: Koh - given: Nesreen family: Ahmed editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18463-18482 id: reddy22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18463 lastpage: 18482 published: 2022-06-28 00:00:00 +0000 - title: 'Universality of Winning Tickets: A Renormalization Group Perspective' abstract: 'Foundational work on the Lottery Ticket Hypothesis has suggested an exciting corollary: winning tickets found in the context of one task can be transferred to similar tasks, possibly even across different architectures. This has generated broad interest, but methods to study this universality are lacking. We make use of renormalization group theory, a powerful tool from theoretical physics, to address this need. We find that iterative magnitude pruning, the principal algorithm used for discovering winning tickets, is a renormalization group scheme, and can be viewed as inducing a flow in parameter space. We demonstrate that ResNet-50 models with transferable winning tickets have flows with common properties, as would be expected from the theory. Similar observations are made for BERT models, with evidence that their flows are near fixed points. Additionally, we leverage our framework to study winning tickets transferred across ResNet architectures, observing that smaller models have flows with more uniform properties than larger models, complicating transfer between them.' volume: 162 URL: https://proceedings.mlr.press/v162/redman22a.html PDF: https://proceedings.mlr.press/v162/redman22a/redman22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-redman22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: William T family: Redman - given: Tianlong family: Chen - given: Zhangyang family: Wang - given: Akshunna S. family: Dogra editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18483-18498 id: redman22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18483 lastpage: 18498 published: 2022-06-28 00:00:00 +0000 - title: 'The dynamics of representation learning in shallow, non-linear autoencoders' abstract: 'Autoencoders are the simplest neural network for unsupervised learning, and thus an ideal framework for studying feature learning. While a detailed understanding of the dynamics of linear autoencoders has recently been obtained, the study of non-linear autoencoders has been hindered by the technical difficulty of handling training data with non-trivial correlations {–} a fundamental prerequisite for feature extraction. Here, we study the dynamics of feature learning in non-linear, shallow autoencoders. We derive a set of asymptotically exact equations that describe the generalisation dynamics of autoencoders trained with stochastic gradient descent (SGD) in the limit of high-dimensional inputs. These equations reveal that autoencoders learn the leading principal components of their inputs sequentially. An analysis of the long-time dynamics explains the failure of sigmoidal autoencoders to learn with tied weights, and highlights the importance of training the bias in ReLU autoencoders. Building on previous results for linear networks, we analyse a modification of the vanilla SGD algorithm which allows learning of the exact principal components. Finally, we show that our equations accurately describe the generalisation dynamics of non-linear autoencoders on realistic datasets such as CIFAR10.' volume: 162 URL: https://proceedings.mlr.press/v162/refinetti22a.html PDF: https://proceedings.mlr.press/v162/refinetti22a/refinetti22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-refinetti22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Maria family: Refinetti - given: Sebastian family: Goldt editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18499-18519 id: refinetti22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18499 lastpage: 18519 published: 2022-06-28 00:00:00 +0000 - title: 'Proximal Exploration for Model-guided Protein Sequence Design' abstract: 'Designing protein sequences with a particular biological function is a long-lasting challenge for protein engineering. Recent advances in machine-learning-guided approaches focus on building a surrogate sequence-function model to reduce the burden of expensive in-lab experiments. In this paper, we study the exploration mechanism of model-guided sequence design. We leverage a natural property of protein fitness landscape that a concise set of mutations upon the wild-type sequence are usually sufficient to enhance the desired function. By utilizing this property, we propose Proximal Exploration (PEX) algorithm that prioritizes the evolutionary search for high-fitness mutants with low mutation counts. In addition, we develop a specialized model architecture, called Mutation Factorization Network (MuFacNet), to predict low-order mutational effects, which further improves the sample efficiency of model-guided evolution. In experiments, we extensively evaluate our method on a suite of in-silico protein sequence design tasks and demonstrate substantial improvement over baseline algorithms.' volume: 162 URL: https://proceedings.mlr.press/v162/ren22a.html PDF: https://proceedings.mlr.press/v162/ren22a/ren22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ren22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhizhou family: Ren - given: Jiahan family: Li - given: Fan family: Ding - given: Yuan family: Zhou - given: Jianzhu family: Ma - given: Jian family: Peng editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18520-18536 id: ren22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18520 lastpage: 18536 published: 2022-06-28 00:00:00 +0000 - title: 'Towards Theoretical Analysis of Transformation Complexity of ReLU DNNs' abstract: 'This paper aims to theoretically analyze the complexity of feature transformations encoded in piecewise linear DNNs with ReLU layers. We propose metrics to measure three types of complexities of transformations based on the information theory. We further discover and prove the strong correlation between the complexity and the disentanglement of transformations. Based on the proposed metrics, we analyze two typical phenomena of the change of the transformation complexity during the training process, and explore the ceiling of a DNN’s complexity. The proposed metrics can also be used as a loss to learn a DNN with the minimum complexity, which also controls the over-fitting level of the DNN and influences adversarial robustness, adversarial transferability, and knowledge consistency. Comprehensive comparative studies have provided new perspectives to understand the DNN. The code is released at https://github.com/sjtu-XAI-lab/transformation-complexity.' volume: 162 URL: https://proceedings.mlr.press/v162/ren22b.html PDF: https://proceedings.mlr.press/v162/ren22b/ren22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ren22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jie family: Ren - given: Mingjie family: Li - given: Meng family: Zhou - given: Shih-Han family: Chan - given: Quanshi family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18537-18558 id: ren22b issued: date-parts: - 2022 - 6 - 28 firstpage: 18537 lastpage: 18558 published: 2022-06-28 00:00:00 +0000 - title: 'Benchmarking and Analyzing Point Cloud Classification under Corruptions' abstract: '3D perception, especially point cloud classification, has achieved substantial progress. However, in real-world deployment, point cloud corruptions are inevitable due to the scene complexity, sensor inaccuracy, and processing imprecision. In this work, we aim to rigorously benchmark and analyze point cloud classification under corruptions. To conduct a systematic investigation, we first provide a taxonomy of common 3D corruptions and identify the atomic corruptions. Then, we perform a comprehensive evaluation on a wide range of representative point cloud models to understand their robustness and generalizability. Our benchmark results show that although point cloud classification performance improves over time, the state-of-the-art methods are on the verge of being less robust. Based on the obtained observations, we propose several effective techniques to enhance point cloud classifier robustness. We hope our comprehensive benchmark, in-depth analysis, and proposed techniques could spark future research in robust 3D perception.' volume: 162 URL: https://proceedings.mlr.press/v162/ren22c.html PDF: https://proceedings.mlr.press/v162/ren22c/ren22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ren22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jiawei family: Ren - given: Liang family: Pan - given: Ziwei family: Liu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18559-18575 id: ren22c issued: date-parts: - 2022 - 6 - 28 firstpage: 18559 lastpage: 18575 published: 2022-06-28 00:00:00 +0000 - title: 'A Unified View on PAC-Bayes Bounds for Meta-Learning' abstract: 'Meta learning automatically infers an inductive bias, that includes the hyperparameter of the baselearning algorithm, by observing data from a finite number of related tasks. This paper studies PAC-Bayes bounds on meta generalization gap. The meta-generalization gap comprises two sources of generalization gaps: the environmentlevel and task-level gaps resulting from observation of a finite number of tasks and data samples per task, respectively. In this paper, by upper bounding arbitrary convex functions, which link the expected and empirical losses at the environment and also per-task levels, we obtain new PAC-Bayes bounds. Using these bounds, we develop new PAC-Bayes meta-learning algorithms. Numerical examples demonstrate the merits of the proposed novel bounds and algorithm in comparison to prior PAC-Bayes bounds for meta-learning' volume: 162 URL: https://proceedings.mlr.press/v162/rezazadeh22a.html PDF: https://proceedings.mlr.press/v162/rezazadeh22a/rezazadeh22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-rezazadeh22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Arezou family: Rezazadeh editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18576-18595 id: rezazadeh22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18576 lastpage: 18595 published: 2022-06-28 00:00:00 +0000 - title: '3PC: Three Point Compressors for Communication-Efficient Distributed Training and a Better Theory for Lazy Aggregation' abstract: 'We propose and study a new class of gradient compressors for communication-efficient training—three point compressors (3PC)—as well as efficient distributed nonconvex optimization algorithms that can take advantage of them. Unlike most established approaches, which rely on a static compressor choice (e.g., TopK), our class allows the compressors to evolve throughout the training process, with the aim of improving the theoretical communication complexity and practical efficiency of the underlying methods. We show that our general approach can recover the recently proposed state-of-the-art error feedback mechanism EF21 (Richtárik et al, 2021) and its theoretical properties as a special case, but also leads to a number of new efficient methods. Notably, our approach allows us to improve upon the state-of-the-art in the algorithmic and theoretical foundations of the lazy aggregation literature (Liu et al, 2017; Lan et al, 2017). As a by-product that may be of independent interest, we provide a new and fundamental link between the lazy aggregation and error feedback literature. A special feature of our work is that we do not require the compressors to be unbiased.' volume: 162 URL: https://proceedings.mlr.press/v162/richtarik22a.html PDF: https://proceedings.mlr.press/v162/richtarik22a/richtarik22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-richtarik22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Peter family: Richtarik - given: Igor family: Sokolov - given: Elnur family: Gasanov - given: Ilyas family: Fatkhullin - given: Zhize family: Li - given: Eduard family: Gorbunov editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18596-18648 id: richtarik22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18596 lastpage: 18648 published: 2022-06-28 00:00:00 +0000 - title: 'Robust SDE-Based Variational Formulations for Solving Linear PDEs via Deep Learning' abstract: 'The combination of Monte Carlo methods and deep learning has recently led to efficient algorithms for solving partial differential equations (PDEs) in high dimensions. Related learning problems are often stated as variational formulations based on associated stochastic differential equations (SDEs), which allow the minimization of corresponding losses using gradient-based optimization methods. In respective numerical implementations it is therefore crucial to rely on adequate gradient estimators that exhibit low variance in order to reach convergence accurately and swiftly. In this article, we rigorously investigate corresponding numerical aspects that appear in the context of linear Kolmogorov PDEs. In particular, we systematically compare existing deep learning approaches and provide theoretical explanations for their performances. Subsequently, we suggest novel methods that can be shown to be more robust both theoretically and numerically, leading to substantial performance improvements.' volume: 162 URL: https://proceedings.mlr.press/v162/richter22a.html PDF: https://proceedings.mlr.press/v162/richter22a/richter22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-richter22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Lorenz family: Richter - given: Julius family: Berner editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18649-18666 id: richter22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18649 lastpage: 18666 published: 2022-06-28 00:00:00 +0000 - title: 'Probabilistically Robust Learning: Balancing Average and Worst-case Performance' abstract: 'Many of the successes of machine learning are based on minimizing an averaged loss function. However, it is well-known that this paradigm suffers from robustness issues that hinder its applicability in safety-critical domains. These issues are often addressed by training against worst-case perturbations of data, a technique known as adversarial training. Although empirically effective, adversarial training can be overly conservative, leading to unfavorable trade-offs between nominal performance and robustness. To this end, in this paper we propose a framework called probabilistic robustness that bridges the gap between the accurate, yet brittle average case and the robust, yet conservative worst case by enforcing robustness to most rather than to all perturbations. From a theoretical point of view, this framework overcomes the trade-offs between the performance and the sample-complexity of worst-case and average-case learning. From a practical point of view, we propose a novel algorithm based on risk-aware optimization that effectively balances average- and worst-case performance at a considerably lower computational cost relative to adversarial training. Our results on MNIST, CIFAR-10, and SVHN illustrate the advantages of this framework on the spectrum from average- to worst-case robustness. Our code is available at: https://github.com/arobey1/advbench.' volume: 162 URL: https://proceedings.mlr.press/v162/robey22a.html PDF: https://proceedings.mlr.press/v162/robey22a/robey22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-robey22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Alexander family: Robey - given: Luiz family: Chamon - given: George J. family: Pappas - given: Hamed family: Hassani editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18667-18686 id: robey22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18667 lastpage: 18686 published: 2022-06-28 00:00:00 +0000 - title: 'LyaNet: A Lyapunov Framework for Training Neural ODEs' abstract: 'We propose a method for training ordinary differential equations by using a control-theoretic Lyapunov condition for stability. Our approach, called LyaNet, is based on a novel Lyapunov loss formulation that encourages the inference dynamics to converge quickly to the correct prediction. Theoretically, we show that minimizing Lyapunov loss guarantees exponential convergence to the correct solution and enables a novel robustness guarantee. We also provide practical algorithms, including one that avoids the cost of backpropagating through a solver or using the adjoint method. Relative to standard Neural ODE training, we empirically find that LyaNet can offer improved prediction performance, faster convergence of inference dynamics, and improved adversarial robustness. Our code is available at https://github.com/ivandariojr/LyapunovLearning.' volume: 162 URL: https://proceedings.mlr.press/v162/rodriguez22a.html PDF: https://proceedings.mlr.press/v162/rodriguez22a/rodriguez22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-rodriguez22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ivan Dario Jimenez family: Rodriguez - given: Aaron family: Ames - given: Yisong family: Yue editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18687-18703 id: rodriguez22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18687 lastpage: 18703 published: 2022-06-28 00:00:00 +0000 - title: 'Short-Term Plasticity Neurons Learning to Learn and Forget' abstract: 'Short-term plasticity (STP) is a mechanism that stores decaying memories in synapses of the cerebral cortex. In computing practice, STP has been used, but mostly in the niche of spiking neurons, even though theory predicts that it is the optimal solution to certain dynamic tasks. Here we present a new type of recurrent neural unit, the STP Neuron (STPN), which indeed turns out strikingly powerful. Its key mechanism is that synapses have a state, propagated through time by a self-recurrent connection-within-the-synapse. This formulation enables training the plasticity with backpropagation through time, resulting in a form of learning to learn and forget in the short term. The STPN outperforms all tested alternatives, i.e. RNNs, LSTMs, other models with fast weights, and differentiable plasticity. We confirm this in both supervised and reinforcement learning (RL), and in tasks such as Associative Retrieval, Maze Exploration, Atari video games, and MuJoCo robotics. Moreover, we calculate that, in neuromorphic or biological circuits, the STPN minimizes energy consumption across models, as it depresses individual synapses dynamically. Based on these, biological STP may have been a strong evolutionary attractor that maximizes both efficiency and computational power. The STPN now brings these neuromorphic advantages also to a broad spectrum of machine learning practice. Code is available in https://github.com/NeuromorphicComputing/stpn.' volume: 162 URL: https://proceedings.mlr.press/v162/rodriguez22b.html PDF: https://proceedings.mlr.press/v162/rodriguez22b/rodriguez22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-rodriguez22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hector Garcia family: Rodriguez - given: Qinghai family: Guo - given: Timoleon family: Moraitis editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18704-18722 id: rodriguez22b issued: date-parts: - 2022 - 6 - 28 firstpage: 18704 lastpage: 18722 published: 2022-06-28 00:00:00 +0000 - title: 'Function-space Inference with Sparse Implicit Processes' abstract: 'Implicit Processes (IPs) represent a flexible framework that can be used to describe a wide variety of models, from Bayesian neural networks, neural samplers and data generators to many others. IPs also allow for approximate inference in function-space. This change of formulation solves intrinsic degenerate problems of parameter-space approximate inference concerning the high number of parameters and their strong dependencies in large models. For this, previous works in the literature have attempted to employ IPs both to set up the prior and to approximate the resulting posterior. However, this has proven to be a challenging task. Existing methods that can tune the prior IP result in a Gaussian predictive distribution, which fails to capture important data patterns. By contrast, methods producing flexible predictive distributions by using another IP to approximate the posterior process cannot tune the prior IP to the observed data. We propose here the first method that can accomplish both goals. For this, we rely on an inducing-point representation of the prior IP, as often done in the context of sparse Gaussian processes. The result is a scalable method for approximate inference with IPs that can tune the prior IP parameters to the data, and that provides accurate non-Gaussian predictive distributions.' volume: 162 URL: https://proceedings.mlr.press/v162/rodri-guez-santana22a.html PDF: https://proceedings.mlr.press/v162/rodri-guez-santana22a/rodri-guez-santana22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-rodri-guez-santana22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Simon family: Rodrı́guez-Santana - given: Bryan family: Zaldivar - given: Daniel family: Hernandez-Lobato editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18723-18740 id: rodri-guez-santana22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18723 lastpage: 18740 published: 2022-06-28 00:00:00 +0000 - title: 'Score Matching Enables Causal Discovery of Nonlinear Additive Noise Models' abstract: 'This paper demonstrates how to recover causal graphs from the score of the data distribution in non-linear additive (Gaussian) noise models. Using score matching algorithms as a building block, we show how to design a new generation of scalable causal discovery methods. To showcase our approach, we also propose a new efficient method for approximating the score’s Jacobian, enabling to recover the causal graph. Empirically, we find that the new algorithm, called SCORE, is competitive with state-of-the-art causal discovery methods while being significantly faster.' volume: 162 URL: https://proceedings.mlr.press/v162/rolland22a.html PDF: https://proceedings.mlr.press/v162/rolland22a/rolland22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-rolland22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Paul family: Rolland - given: Volkan family: Cevher - given: Matthäus family: Kleindessner - given: Chris family: Russell - given: Dominik family: Janzing - given: Bernhard family: Schölkopf - given: Francesco family: Locatello editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18741-18753 id: rolland22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18741 lastpage: 18753 published: 2022-06-28 00:00:00 +0000 - title: 'Dual Decomposition of Convex Optimization Layers for Consistent Attention in Medical Images' abstract: 'A key concern in integrating machine learning models in medicine is the ability to interpret their reasoning. Popular explainability methods have demonstrated satisfactory results in natural image recognition, yet in medical image analysis, many of these approaches provide partial and noisy explanations. Recently, attention mechanisms have shown compelling results both in their predictive performance and in their interpretable qualities. A fundamental trait of attention is that it leverages salient parts of the input which contribute to the model’s prediction. To this end, our work focuses on the explanatory value of attention weight distributions. We propose a multi-layer attention mechanism that enforces consistent interpretations between attended convolutional layers using convex optimization. We apply duality to decompose the consistency constraints between the layers by reparameterizing their attention probability distributions. We further suggest learning the dual witness by optimizing with respect to our objective; thus, our implementation uses standard back-propagation, hence it is highly efficient. While preserving predictive performance, our proposed method leverages weakly annotated medical imaging data and provides complete and faithful explanations to the model’s prediction.' volume: 162 URL: https://proceedings.mlr.press/v162/ron22a.html PDF: https://proceedings.mlr.press/v162/ron22a/ron22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ron22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tom family: Ron - given: Tamir family: Hazan editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18754-18769 id: ron22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18754 lastpage: 18769 published: 2022-06-28 00:00:00 +0000 - title: 'A Consistent and Efficient Evaluation Strategy for Attribution Methods' abstract: 'With a variety of local feature attribution methods being proposed in recent years, follow-up work suggested several evaluation strategies. To assess the attribution quality across different attribution techniques, the most popular among these evaluation strategies in the image domain use pixel perturbations. However, recent advances discovered that different evaluation strategies produce conflicting rankings of attribution methods and can be prohibitively expensive to compute. In this work, we present an information-theoretic analysis of evaluation strategies based on pixel perturbations. Our findings reveal that the results are strongly affected by information leakage through the shape of the removed pixels as opposed to their actual values. Using our theoretical insights, we propose a novel evaluation framework termed Remove and Debias (ROAD) which offers two contributions: First, it mitigates the impact of the confounders, which entails higher consistency among evaluation strategies. Second, ROAD does not require the computationally expensive retraining step and saves up to 99% in computational costs compared to the state-of-the-art. We release our source code at https://github.com/tleemann/road_evaluation.' volume: 162 URL: https://proceedings.mlr.press/v162/rong22a.html PDF: https://proceedings.mlr.press/v162/rong22a/rong22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-rong22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yao family: Rong - given: Tobias family: Leemann - given: Vadim family: Borisov - given: Gjergji family: Kasneci - given: Enkelejda family: Kasneci editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18770-18795 id: rong22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18770 lastpage: 18795 published: 2022-06-28 00:00:00 +0000 - title: 'Efficiently Learning the Topology and Behavior of a Networked Dynamical System Via Active Queries' abstract: 'Using a discrete dynamical system model, many papers have addressed the problem of learning the behavior (i.e., the local function at each node) of a networked system through active queries, assuming that the network topology is known. We address the problem of inferring both the topology of the network and the behavior of a discrete dynamical system through active queries. We consider two query models studied in the literature, namely the batch model (where all the queries must be submitted together) and the adaptive model (where responses to previous queries can be used in formulating a new query). Our results are for systems where the state of each node is from {0,1} and the local functions are Boolean. We present algorithms to learn the topology and the behavior under both batch and adaptive query models for several classes of dynamical systems. These algorithms use only a polynomial number of queries. We also present experimental results obtained by running our query generation algorithms on synthetic and real-world networks.' volume: 162 URL: https://proceedings.mlr.press/v162/rosenkrantz22a.html PDF: https://proceedings.mlr.press/v162/rosenkrantz22a/rosenkrantz22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-rosenkrantz22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Daniel J family: Rosenkrantz - given: Abhijin family: Adiga - given: Madhav family: Marathe - given: Zirou family: Qiu - given: S S family: Ravi - given: Richard family: Stearns - given: Anil family: Vullikanti editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18796-18808 id: rosenkrantz22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18796 lastpage: 18808 published: 2022-06-28 00:00:00 +0000 - title: 'Learning to Infer Structures of Network Games' abstract: 'Strategic interactions between a group of individuals or organisations can be modelled as games played on networks, where a player’s payoff depends not only on their actions but also on those of their neighbours. Inferring the network structure from observed game outcomes (equilibrium actions) is an important problem with numerous potential applications in economics and social sciences. Existing methods mostly require the knowledge of the utility function associated with the game, which is often unrealistic to obtain in real-world scenarios. We adopt a transformer-like architecture which correctly accounts for the symmetries of the problem and learns a mapping from the equilibrium actions to the network structure of the game without explicit knowledge of the utility function. We test our method on three different types of network games using both synthetic and real-world data, and demonstrate its effectiveness in network structure inference and superior performance over existing methods.' volume: 162 URL: https://proceedings.mlr.press/v162/rossi22a.html PDF: https://proceedings.mlr.press/v162/rossi22a/rossi22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-rossi22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Emanuele family: Rossi - given: Federico family: Monti - given: Yan family: Leng - given: Michael family: Bronstein - given: Xiaowen family: Dong editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18809-18827 id: rossi22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18809 lastpage: 18827 published: 2022-06-28 00:00:00 +0000 - title: 'Direct Behavior Specification via Constrained Reinforcement Learning' abstract: 'The standard formulation of Reinforcement Learning lacks a practical way of specifying what are admissible and forbidden behaviors. Most often, practitioners go about the task of behavior specification by manually engineering the reward function, a counter-intuitive process that requires several iterations and is prone to reward hacking by the agent. In this work, we argue that constrained RL, which has almost exclusively been used for safe RL, also has the potential to significantly reduce the amount of work spent for reward specification in applied RL projects. To this end, we propose to specify behavioral preferences in the CMDP framework and to use Lagrangian methods to automatically weigh each of these behavioral constraints. Specifically, we investigate how CMDPs can be adapted to solve goal-based tasks while adhering to several constraints simultaneously. We evaluate this framework on a set of continuous control tasks relevant to the application of Reinforcement Learning for NPC design in video games.' volume: 162 URL: https://proceedings.mlr.press/v162/roy22a.html PDF: https://proceedings.mlr.press/v162/roy22a/roy22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-roy22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Julien family: Roy - given: Roger family: Girgis - given: Joshua family: Romoff - given: Pierre-Luc family: Bacon - given: Chris J family: Pal editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18828-18843 id: roy22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18828 lastpage: 18843 published: 2022-06-28 00:00:00 +0000 - title: 'Constraint-based graph network simulator' abstract: 'In the area of physical simulations, nearly all neural-network-based methods directly predict future states from the input states. However, many traditional simulation engines instead model the constraints of the system and select the state which satisfies them. Here we present a framework for constraint-based learned simulation, where a scalar constraint function is implemented as a graph neural network, and future predictions are computed by solving the optimization problem defined by the learned constraint. Our model achieves comparable or better accuracy to top learned simulators on a variety of challenging physical domains, and offers several unique advantages. We can improve the simulation accuracy on a larger system by applying more solver iterations at test time. We also can incorporate novel hand-designed constraints at test time and simulate new dynamics which were not present in the training data. Our constraint-based framework shows how key techniques from traditional simulation and numerical methods can be leveraged as inductive biases in machine learning simulators.' volume: 162 URL: https://proceedings.mlr.press/v162/rubanova22a.html PDF: https://proceedings.mlr.press/v162/rubanova22a/rubanova22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-rubanova22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yulia family: Rubanova - given: Alvaro family: Sanchez-Gonzalez - given: Tobias family: Pfaff - given: Peter family: Battaglia editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18844-18870 id: rubanova22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18844 lastpage: 18870 published: 2022-06-28 00:00:00 +0000 - title: 'Continual Learning via Sequential Function-Space Variational Inference' abstract: 'Sequential Bayesian inference over predictive functions is a natural framework for continual learning from streams of data. However, applying it to neural networks has proved challenging in practice. Addressing the drawbacks of existing techniques, we propose an optimization objective derived by formulating continual learning as sequential function-space variational inference. In contrast to existing methods that regularize neural network parameters directly, this objective allows parameters to vary widely during training, enabling better adaptation to new tasks. Compared to objectives that directly regularize neural network predictions, the proposed objective allows for more flexible variational distributions and more effective regularization. We demonstrate that, across a range of task sequences, neural networks trained via sequential function-space variational inference achieve better predictive accuracy than networks trained with related methods while depending less on maintaining a set of representative points from previous tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/rudner22a.html PDF: https://proceedings.mlr.press/v162/rudner22a/rudner22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-rudner22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tim G. J. family: Rudner - given: Freddie family: Bickford Smith - given: Qixuan family: Feng - given: Yee Whye family: Teh - given: Yarin family: Gal editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18871-18887 id: rudner22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18871 lastpage: 18887 published: 2022-06-28 00:00:00 +0000 - title: 'Graph-Coupled Oscillator Networks' abstract: 'We propose Graph-Coupled Oscillator Networks (GraphCON), a novel framework for deep learning on graphs. It is based on discretizations of a second-order system of ordinary differential equations (ODEs), which model a network of nonlinear controlled and damped oscillators, coupled via the adjacency structure of the underlying graph. The flexibility of our framework permits any basic GNN layer (e.g. convolutional or attentional) as the coupling function, from which a multi-layer deep neural network is built up via the dynamics of the proposed ODEs. We relate the oversmoothing problem, commonly encountered in GNNs, to the stability of steady states of the underlying ODE and show that zero-Dirichlet energy steady states are not stable for our proposed ODEs. This demonstrates that the proposed framework mitigates the oversmoothing problem. Moreover, we prove that GraphCON mitigates the exploding and vanishing gradients problem to facilitate training of deep multi-layer GNNs. Finally, we show that our approach offers competitive performance with respect to the state-of-the-art on a variety of graph-based learning tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/rusch22a.html PDF: https://proceedings.mlr.press/v162/rusch22a/rusch22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-rusch22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: T. Konstantin family: Rusch - given: Ben family: Chamberlain - given: James family: Rowbottom - given: Siddhartha family: Mishra - given: Michael family: Bronstein editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18888-18909 id: rusch22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18888 lastpage: 18909 published: 2022-06-28 00:00:00 +0000 - title: 'Hindering Adversarial Attacks with Implicit Neural Representations' abstract: 'We introduce the Lossy Implicit Network Activation Coding (LINAC) defence, an input transformation which successfully hinders several common adversarial attacks on CIFAR-10 classifiers for perturbations up to 8/255 in Linf norm and 0.5 in L2 norm. Implicit neural representations are used to approximately encode pixel colour intensities in 2D images such that classifiers trained on transformed data appear to have robustness to small perturbations without adversarial training or large drops in performance. The seed of the random number generator used to initialise and train the implicit neural representation turns out to be necessary information for stronger generic attacks, suggesting its role as a private key. We devise a Parametric Bypass Approximation (PBA) attack strategy for key-based defences, which successfully invalidates an existing method in this category. Interestingly, our LINAC defence also hinders some transfer and adaptive attacks, including our novel PBA strategy. Our results emphasise the importance of a broad range of customised attacks despite apparent robustness according to standard evaluations.' volume: 162 URL: https://proceedings.mlr.press/v162/rusu22a.html PDF: https://proceedings.mlr.press/v162/rusu22a/rusu22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-rusu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Andrei A family: Rusu - given: Dan Andrei family: Calian - given: Sven family: Gowal - given: Raia family: Hadsell editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18910-18934 id: rusu22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18910 lastpage: 18934 published: 2022-06-28 00:00:00 +0000 - title: 'Exploiting Independent Instruments: Identification and Distribution Generalization' abstract: 'Instrumental variable models allow us to identify a causal function between covariates $X$ and a response $Y$, even in the presence of unobserved confounding. Most of the existing estimators assume that the error term in the response $Y$ and the hidden confounders are uncorrelated with the instruments $Z$. This is often motivated by a graphical separation, an argument that also justifies independence. Positing an independence restriction, however, leads to strictly stronger identifiability results. We connect to the existing literature in econometrics and provide a practical method called HSIC-X for exploiting independence that can be combined with any gradient-based learning procedure. We see that even in identifiable settings, taking into account higher moments may yield better finite sample results. Furthermore, we exploit the independence for distribution generalization. We prove that the proposed estimator is invariant to distributional shifts on the instruments and worst-case optimal whenever these shifts are sufficiently strong. These results hold even in the under-identified case where the instruments are not sufficiently rich to identify the causal function.' volume: 162 URL: https://proceedings.mlr.press/v162/saengkyongam22a.html PDF: https://proceedings.mlr.press/v162/saengkyongam22a/saengkyongam22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-saengkyongam22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sorawit family: Saengkyongam - given: Leonard family: Henckel - given: Niklas family: Pfister - given: Jonas family: Peters editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18935-18958 id: saengkyongam22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18935 lastpage: 18958 published: 2022-06-28 00:00:00 +0000 - title: 'FedNL: Making Newton-Type Methods Applicable to Federated Learning' abstract: 'Inspired by recent work of Islamov et al (2021), we propose a family of Federated Newton Learn (\algname{FedNL}) methods, which we believe is a marked step in the direction of making second-order methods applicable to FL. In contrast to the aforementioned work, \algname{FedNL} employs a different Hessian learning technique which i) enhances privacy as it does not rely on the training data to be revealed to the coordinating server, ii) makes it applicable beyond generalized linear models, and iii) provably works with general contractive compression operators for compressing the local Hessians, such as Top-$K$ or Rank-$R$, which are vastly superior in practice. Notably, we do not need to rely on error feedback for our methods to work with contractive compressors. Moreover, we develop \algname{FedNL-PP}, \algname{FedNL-CR} and \algname{FedNL-LS}, which are variants of \algname{FedNL} that support partial participation, and globalization via cubic regularization and line search, respectively, and \algname{FedNL-BC}, which is a variant that can further benefit from bidirectional compression of gradients and models, i.e., smart uplink gradient and smart downlink model compression. We prove local convergence rates that are independent of the condition number, the number of training data points, and compression variance. Our communication efficient Hessian learning technique provably learns the Hessian at the optimum. Finally, we perform a variety of numerical experiments that show that our \algname{FedNL} methods have state-of-the-art communication complexity when compared to key baselines.' volume: 162 URL: https://proceedings.mlr.press/v162/safaryan22a.html PDF: https://proceedings.mlr.press/v162/safaryan22a/safaryan22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-safaryan22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mher family: Safaryan - given: Rustem family: Islamov - given: Xun family: Qian - given: Peter family: Richtarik editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 18959-19010 id: safaryan22a issued: date-parts: - 2022 - 6 - 28 firstpage: 18959 lastpage: 19010 published: 2022-06-28 00:00:00 +0000 - title: 'Versatile Dueling Bandits: Best-of-both World Analyses for Learning from Relative Preferences' abstract: 'We study the problem of $K$-armed dueling bandit for both stochastic and adversarial environments, where the goal of the learner is to aggregate information through relative preferences of pair of decision points queried in an online sequential manner. We first propose a novel reduction from any (general) dueling bandits to multi-armed bandits which allows us to improve many existing results in dueling bandits. In particular, we give the first best-of-both world result for the dueling bandits regret minimization problem—a unified framework that is guaranteed to perform optimally for both stochastic and adversarial preferences simultaneously. Moreover, our algorithm is also the first to achieve an optimal $O(\sum_{i = 1}^K \frac{\log T}{\Delta_i})$ regret bound against the Condorcet-winner benchmark, which scales optimally both in terms of the arm-size $K$ and the instance-specific suboptimality gaps $\{\Delta_i\}_{i = 1}^K$. This resolves the long standing problem of designing an instancewise gap-dependent order optimal regret algorithm for dueling bandits (with matching lower bounds up to small constant factors). We further justify the robustness of our proposed algorithm by proving its optimal regret rate under adversarially corrupted preferences—this outperforms the existing state-of-the-art corrupted dueling results by a large margin. In summary, we believe our reduction idea will find a broader scope in solving a diverse class of dueling bandits setting, which are otherwise studied separately from multi-armed bandits with often more complex solutions and worse guarantees. The efficacy of our proposed algorithms are empirically corroborated against state-of-the art dueling bandit methods.' volume: 162 URL: https://proceedings.mlr.press/v162/saha22a.html PDF: https://proceedings.mlr.press/v162/saha22a/saha22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-saha22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Aadirupa family: Saha - given: Pierre family: Gaillard editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19011-19026 id: saha22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19011 lastpage: 19026 published: 2022-06-28 00:00:00 +0000 - title: 'Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits' abstract: 'We study the problem of dynamic regret minimization in $K$-armed Dueling Bandits under non-stationary or time-varying preferences. This is an online learning setup where the agent chooses a pair of items at each round and observes only a relative binary ‘win-loss’ feedback for this pair sampled from an underlying preference matrix at that round. We first study the problem of static-regret minimization for adversarial preference sequences and design an efficient algorithm with $O(\sqrt{KT})$ regret bound. We next use similar algorithmic ideas to propose an efficient and provably optimal algorithm for dynamic-regret minimization under two notions of non-stationarities. In particular, we show $\tO(\sqrt{SKT})$ and $\tO({V_T^{1/3}K^{1/3}T^{2/3}})$ dynamic-regret guarantees, respectively, with $S$ being the total number of ‘effective-switches’ in the underlying preference relations and $V_T$ being a measure of ‘continuous-variation’ non-stationarity. These rates are provably optimal as justified with matching lower bound guarantees. Moreover, our proposed algorithms are flexible as they can be easily ‘blackboxed’ to yield dynamic regret guarantees for other notions of dueling bandits regret, including condorcet regret, best-response bounds, and Borda regret. The complexity of these problems have not been studied prior to this work despite the practicality of non-stationary environments. Our results are corroborated with extensive simulations.' volume: 162 URL: https://proceedings.mlr.press/v162/saha22b.html PDF: https://proceedings.mlr.press/v162/saha22b/saha22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-saha22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Aadirupa family: Saha - given: Shubham family: Gupta editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19027-19049 id: saha22b issued: date-parts: - 2022 - 6 - 28 firstpage: 19027 lastpage: 19049 published: 2022-06-28 00:00:00 +0000 - title: 'Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers' abstract: 'Vision transformers using self-attention or its proposed alternatives have demonstrated promising results in many image related tasks. However, the underpinning inductive bias of attention is not well understood. To address this issue, this paper analyzes attention through the lens of convex duality. For the non-linear dot-product self-attention, and alternative mechanisms such as MLP-mixer and Fourier Neural Operator (FNO), we derive equivalent finite-dimensional convex problems that are interpretable and solvable to global optimality. The convex programs lead to block nuclear-norm regularization that promotes low rank in the latent feature and token dimensions. In particular, we show how self-attention networks implicitly clusters the tokens, based on their latent similarity. We conduct experiments for transferring a pre-trained transformer backbone for CIFAR-100 classification by fine-tuning a variety of convex attention heads. The results indicate the merits of the bias induced by attention compared with the existing MLP or linear heads.' volume: 162 URL: https://proceedings.mlr.press/v162/sahiner22a.html PDF: https://proceedings.mlr.press/v162/sahiner22a/sahiner22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-sahiner22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Arda family: Sahiner - given: Tolga family: Ergen - given: Batu family: Ozturkler - given: John family: Pauly - given: Morteza family: Mardani - given: Mert family: Pilanci editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19050-19088 id: sahiner22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19050 lastpage: 19088 published: 2022-06-28 00:00:00 +0000 - title: 'Off-Policy Evaluation for Large Action Spaces via Embeddings' abstract: 'Off-policy evaluation (OPE) in contextual bandits has seen rapid adoption in real-world systems, since it enables offline evaluation of new policies using only historic log data. Unfortunately, when the number of actions is large, existing OPE estimators – most of which are based on inverse propensity score weighting – degrade severely and can suffer from extreme bias and variance. This foils the use of OPE in many applications from recommender systems to language models. To overcome this issue, we propose a new OPE estimator that leverages marginalized importance weights when action embeddings provide structure in the action space. We characterize the bias, variance, and mean squared error of the proposed estimator and analyze the conditions under which the action embedding provides statistical benefits over conventional estimators. In addition to the theoretical analysis, we find that the empirical performance improvement can be substantial, enabling reliable OPE even when existing estimators collapse due to a large number of actions.' volume: 162 URL: https://proceedings.mlr.press/v162/saito22a.html PDF: https://proceedings.mlr.press/v162/saito22a/saito22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-saito22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yuta family: Saito - given: Thorsten family: Joachims editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19089-19122 id: saito22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19089 lastpage: 19122 published: 2022-06-28 00:00:00 +0000 - title: 'Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training' abstract: 'Data clipping is crucial in reducing noise in quantization operations and improving the achievable accuracy of quantization-aware training (QAT). Current practices rely on heuristics to set clipping threshold scalars and cannot be shown to be optimal. We propose Optimally Clipped Tensors And Vectors (OCTAV), a recursive algorithm to determine MSE-optimal clipping scalars. Derived from the fast Newton-Raphson method, OCTAV finds optimal clipping scalars on the fly, for every tensor, at every iteration of the QAT routine. Thus, the QAT algorithm is formulated with provably minimum quantization noise at each step. In addition, we reveal limitations in common gradient estimation techniques in QAT and propose magnitude-aware differentiation as a remedy to further improve accuracy. Experimentally, OCTAV-enabled QAT achieves state-of-the-art accuracy on multiple tasks. These include training-from-scratch and retraining ResNets and MobileNets on ImageNet, and Squad fine-tuning using BERT models, where OCTAV-enabled QAT consistently preserves accuracy at low precision (4-to-6-bits). Our results require no modifications to the baseline training recipe, except for the insertion of quantization operations where appropriate.' volume: 162 URL: https://proceedings.mlr.press/v162/sakr22a.html PDF: https://proceedings.mlr.press/v162/sakr22a/sakr22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-sakr22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Charbel family: Sakr - given: Steve family: Dai - given: Rangha family: Venkatesan - given: Brian family: Zimmer - given: William family: Dally - given: Brucek family: Khailany editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19123-19138 id: sakr22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19123 lastpage: 19138 published: 2022-06-28 00:00:00 +0000 - title: 'A Convergence Theory for SVGD in the Population Limit under Talagrand’s Inequality T1' abstract: 'Stein Variational Gradient Descent (SVGD) is an algorithm for sampling from a target density which is known up to a multiplicative constant. Although SVGD is a popular algorithm in practice, its theoretical study is limited to a few recent works. We study the convergence of SVGD in the population limit, (i.e., with an infinite number of particles) to sample from a non-logconcave target distribution satisfying Talagrand’s inequality T1. We first establish the convergence of the algorithm. Then, we establish a dimension-dependent complexity bound in terms of the Kernelized Stein Discrepancy (KSD). Unlike existing works, we do not assume that the KSD is bounded along the trajectory of the algorithm. Our approach relies on interpreting SVGD as a gradient descent over a space of probability measures.' volume: 162 URL: https://proceedings.mlr.press/v162/salim22a.html PDF: https://proceedings.mlr.press/v162/salim22a/salim22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-salim22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Adil family: Salim - given: Lukang family: Sun - given: Peter family: Richtarik editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19139-19152 id: salim22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19139 lastpage: 19152 published: 2022-06-28 00:00:00 +0000 - title: 'FITNESS: (Fine Tune on New and Similar Samples) to detect anomalies in streams with drift and outliers' abstract: 'Technology improvements have made it easier than ever to collect diverse telemetry at high resolution from any cyber or physical system, for both monitoring and control. In the domain of monitoring, anomaly detection has become an important problem in many research areas ranging from IoT and sensor networks to devOps. These systems operate in real, noisy and non-stationary environments. A fundamental question is then, ‘How to quickly spot anomalies in a data-stream, and differentiate them from either sudden or gradual drifts in the normal behaviour?’ Although several heuristics have been proposed for detecting anomalies on streams, no known method has formalized the desiderata and rigorously proven that they can be achieved. We begin by formalizing the problem as a sequential estimation task. We propose \name, (\textbf{Fi}ne \textbf{T}une on \textbf{Ne}w and \textbf{S}imilar \textbf{S}amples), a flexible framework for detecting anomalies on data streams. We show that in the case when the data stream has a gaussian distribution, FITNESS is provably both robust and adaptive. The core of our method is to fine-tune the anomaly detection system only on recent, similar examples, before predicting an anomaly score. We prove that this is sufficient for robustness and adaptivity. We further experimentally demonstrate that \name;{is} flexible in practice, i.e., it can convert existing offline AD algorithms in to robust and adaptive online ones.' volume: 162 URL: https://proceedings.mlr.press/v162/sankararaman22a.html PDF: https://proceedings.mlr.press/v162/sankararaman22a/sankararaman22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-sankararaman22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Abishek family: Sankararaman - given: Balakrishnan family: Narayanaswamy - given: Vikramank Y family: Singh - given: Zhao family: Song editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19153-19177 id: sankararaman22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19153 lastpage: 19177 published: 2022-06-28 00:00:00 +0000 - title: 'The Algebraic Path Problem for Graph Metrics' abstract: 'Finding paths with optimal properties is a foundational problem in computer science. The notions of shortest paths (minimal sum of edge costs), minimax paths (minimal maximum edge weight), reliability of a path and many others all arise as special cases of the "algebraic path problem" (APP). Indeed, the APP formalizes the relation between different semirings such as min-plus, min-max and the distances they induce. We here clarify, for the first time, the relation between the potential distance and the log-semiring. We also define a new unifying family of algebraic structures that include all above-mentioned path problems as well as the commute cost and others as special or limiting cases. The family comprises not only semirings but also strong bimonoids (that is, semirings without distributivity). We call this new and very general distance the "log-norm distance". Finally, we derive some sufficient conditions which ensure that the APP associated with a semiring defines a metric over an arbitrary graph.' volume: 162 URL: https://proceedings.mlr.press/v162/sanmarti-n22a.html PDF: https://proceedings.mlr.press/v162/sanmarti-n22a/sanmarti-n22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-sanmarti-n22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Enrique Fita family: Sanmartı́n - given: Sebastian family: Damrich - given: Fred family: Hamprecht editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19178-19204 id: sanmarti-n22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19178 lastpage: 19204 published: 2022-06-28 00:00:00 +0000 - title: 'LSB: Local Self-Balancing MCMC in Discrete Spaces' abstract: 'We present the Local Self-Balancing sampler (LSB), a local Markov Chain Monte Carlo (MCMC) method for sampling in purely discrete domains, which is able to autonomously adapt to the target distribution and to reduce the number of target evaluations required to converge. LSB is based on (i) a parametrization of locally balanced proposals, (ii) an objective function based on mutual information and (iii) a self-balancing learning procedure, which minimises the proposed objective to update the proposal parameters. Experiments on energy-based models and Markov networks show that LSB converges using a smaller number of queries to the oracle distribution compared to recent local MCMC samplers.' volume: 162 URL: https://proceedings.mlr.press/v162/sansone22a.html PDF: https://proceedings.mlr.press/v162/sansone22a/sansone22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-sansone22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Emanuele family: Sansone editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19205-19220 id: sansone22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19205 lastpage: 19220 published: 2022-06-28 00:00:00 +0000 - title: 'PoF: Post-Training of Feature Extractor for Improving Generalization' abstract: 'It has been intensively investigated that the local shape, especially flatness, of the loss landscape near a minimum plays an important role for generalization of deep models. We developed a training algorithm called PoF: Post-Training of Feature Extractor that updates the feature extractor part of an already-trained deep model to search a flatter minimum. The characteristics are two-fold: 1) Feature extractor is trained under parameter perturbations in the higher-layer parameter space, based on observations that suggest flattening higher-layer parameter space, and 2) the perturbation range is determined in a data-driven manner aiming to reduce a part of test loss caused by the positive loss curvature. We provide a theoretical analysis that shows the proposed algorithm implicitly reduces the target Hessian components as well as the loss. Experimental results show that PoF improved model performance against baseline methods on both CIFAR-10 and CIFAR-100 datasets for only 10-epoch post-training, and on SVHN dataset for 50-epoch post-training.' volume: 162 URL: https://proceedings.mlr.press/v162/sato22a.html PDF: https://proceedings.mlr.press/v162/sato22a/sato22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-sato22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ikuro family: Sato - given: Yamada family: Ryota - given: Masayuki family: Tanaka - given: Nakamasa family: Inoue - given: Rei family: Kawakami editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19221-19230 id: sato22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19221 lastpage: 19230 published: 2022-06-28 00:00:00 +0000 - title: 'Re-evaluating Word Mover’s Distance' abstract: 'The word mover’s distance (WMD) is a fundamental technique for measuring the similarity of two documents. As the crux of WMD, it can take advantage of the underlying geometry of the word space by employing an optimal transport formulation. The original study on WMD reported that WMD outperforms classical baselines such as bag-of-words (BOW) and TF-IDF by significant margins in various datasets. In this paper, we point out that the evaluation in the original study could be misleading. We re-evaluate the performances of WMD and the classical baselines and find that the classical baselines are competitive with WMD if we employ an appropriate preprocessing, i.e., L1 normalization. In addition, we introduce an analogy between WMD and L1-normalized BOW and find that not only the performance of WMD but also the distance values resemble those of BOW in high dimensional spaces.' volume: 162 URL: https://proceedings.mlr.press/v162/sato22b.html PDF: https://proceedings.mlr.press/v162/sato22b/sato22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-sato22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ryoma family: Sato - given: Makoto family: Yamada - given: Hisashi family: Kashima editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19231-19249 id: sato22b issued: date-parts: - 2022 - 6 - 28 firstpage: 19231 lastpage: 19249 published: 2022-06-28 00:00:00 +0000 - title: 'Understanding Contrastive Learning Requires Incorporating Inductive Biases' abstract: 'Contrastive learning is a popular form of self-supervised learning that encourages augmentations (views) of the same input to have more similar representations compared to augmentations of different inputs. Recent attempts to theoretically explain the success of contrastive learning on downstream classification tasks prove guarantees depending on properties of augmentations and the value of contrastive loss of representations. We demonstrate that such analyses, that ignore inductive biases of the function class and training algorithm, cannot adequately explain the success of contrastive learning, even provably leading to vacuous guarantees in some settings. Extensive experiments on image and text domains highlight the ubiquity of this problem – different function classes and algorithms behave very differently on downstream tasks, despite having the same augmentations and contrastive losses. Theoretical analysis is presented for the class of linear representations, where incorporating inductive biases of the function class allows contrastive learning to work with less stringent conditions compared to prior analyses.' volume: 162 URL: https://proceedings.mlr.press/v162/saunshi22a.html PDF: https://proceedings.mlr.press/v162/saunshi22a/saunshi22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-saunshi22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Nikunj family: Saunshi - given: Jordan family: Ash - given: Surbhi family: Goel - given: Dipendra family: Misra - given: Cyril family: Zhang - given: Sanjeev family: Arora - given: Sham family: Kakade - given: Akshay family: Krishnamurthy editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19250-19286 id: saunshi22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19250 lastpage: 19286 published: 2022-06-28 00:00:00 +0000 - title: 'The Neural Race Reduction: Dynamics of Abstraction in Gated Networks' abstract: 'Our theoretical understanding of deep learning has not kept pace with its empirical success. While network architecture is known to be critical, we do not yet understand its effect on learned representations and network behavior, or how this architecture should reflect task structure.In this work, we begin to address this gap by introducing the Gated Deep Linear Network framework that schematizes how pathways of information flow impact learning dynamics within an architecture. Crucially, because of the gating, these networks can compute nonlinear functions of their input. We derive an exact reduction and, for certain cases, exact solutions to the dynamics of learning. Our analysis demonstrates that the learning dynamics in structured networks can be conceptualized as a neural race with an implicit bias towards shared representations, which then govern the model’s ability to systematically generalize, multi-task, and transfer. We validate our key insights on naturalistic datasets and with relaxed assumptions. Taken together, our work gives rise to general hypotheses relating neural architecture to learning and provides a mathematical approach towards understanding the design of more complex architectures and the role of modularity and compositionality in solving real-world problems. The code and results are available at https://www.saxelab.org/gated-dln.' volume: 162 URL: https://proceedings.mlr.press/v162/saxe22a.html PDF: https://proceedings.mlr.press/v162/saxe22a/saxe22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-saxe22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Andrew family: Saxe - given: Shagun family: Sodhani - given: Sam Jay family: Lewallen editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19287-19309 id: saxe22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19287 lastpage: 19309 published: 2022-06-28 00:00:00 +0000 - title: 'Convergence Rates of Non-Convex Stochastic Gradient Descent Under a Generic Lojasiewicz Condition and Local Smoothness' abstract: 'Training over-parameterized neural networks involves the empirical minimization of highly non-convex objective functions. Recently, a large body of works provided theoretical evidence that, despite this non-convexity, properly initialized over-parameterized networks can converge to a zero training loss through the introduction of the Polyak-Lojasiewicz condition. However, these analyses are restricted to quadratic losses such as mean square error, and tend to indicate fast exponential convergence rates that are seldom observed in practice. In this work, we propose to extend these results by analyzing stochastic gradient descent under more generic Lojasiewicz conditions that are applicable to any convex loss function, thus extending the current theory to a larger panel of losses commonly used in practice such as cross-entropy. Moreover, our analysis provides high-probability bounds on the approximation error under sub-Gaussian gradient noise and only requires the local smoothness of the objective function, thus making it applicable to deep neural networks in realistic settings.' volume: 162 URL: https://proceedings.mlr.press/v162/scaman22a.html PDF: https://proceedings.mlr.press/v162/scaman22a/scaman22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-scaman22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Kevin family: Scaman - given: Cedric family: Malherbe - given: Ludovic Dos family: Santos editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19310-19327 id: scaman22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19310 lastpage: 19327 published: 2022-06-28 00:00:00 +0000 - title: 'An Asymptotic Test for Conditional Independence using Analytic Kernel Embeddings' abstract: 'We propose a new conditional dependence measure and a statistical test for conditional independence. The measure is based on the difference between analytic kernel embeddings of two well-suited distributions evaluated at a finite set of locations. We obtain its asymptotic distribution under the null hypothesis of conditional independence and design a consistent statistical test from it. We conduct a series of experiments showing that our new test outperforms state-of-the-art methods both in terms of type-I and type-II errors even in the high dimensional setting.' volume: 162 URL: https://proceedings.mlr.press/v162/scetbon22a.html PDF: https://proceedings.mlr.press/v162/scetbon22a/scetbon22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-scetbon22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Meyer family: Scetbon - given: Laurent family: Meunier - given: Yaniv family: Romano editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19328-19346 id: scetbon22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19328 lastpage: 19346 published: 2022-06-28 00:00:00 +0000 - title: 'Linear-Time Gromov Wasserstein Distances using Low Rank Couplings and Costs' abstract: 'The ability to align points across two related yet incomparable point clouds (e.g. living in different spaces) plays an important role in machine learning. The Gromov-Wasserstein (GW) framework provides an increasingly popular answer to such problems, by seeking a low-distortion, geometry-preserving assignment between these points. As a non-convex, quadratic generalization of optimal transport (OT), GW is NP-hard. While practitioners often resort to solving GW approximately as a nested sequence of entropy-regularized OT problems, the cubic complexity (in the number $n$ of samples) of that approach is a roadblock. We show in this work how a recent variant of the OT problem that restricts the set of admissible couplings to those having a low-rank factorization is remarkably well suited to the resolution of GW: when applied to GW, we show that this approach is not only able to compute a stationary point of the GW problem in time $O(n^2)$, but also uniquely positioned to benefit from the knowledge that the initial cost matrices are low-rank, to yield a linear time $O(n)$ GW approximation. Our approach yields similar results, yet orders of magnitude faster computation than the SoTA entropic GW approaches, on both simulated and real data.' volume: 162 URL: https://proceedings.mlr.press/v162/scetbon22b.html PDF: https://proceedings.mlr.press/v162/scetbon22b/scetbon22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-scetbon22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Meyer family: Scetbon - given: Gabriel family: Peyré - given: Marco family: Cuturi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19347-19365 id: scetbon22b issued: date-parts: - 2022 - 6 - 28 firstpage: 19347 lastpage: 19365 published: 2022-06-28 00:00:00 +0000 - title: 'Streaming Inference for Infinite Feature Models' abstract: 'Unsupervised learning from a continuous stream of data is arguably one of the most common and most challenging problems facing intelligent agents. One class of unsupervised models, collectively termed feature models, attempts unsupervised discovery of latent features underlying the data and includes common models such as PCA, ICA, and NMF. However, if the data arrives in a continuous stream, determining the number of features is a significant challenge and the number may grow with time. In this work, we make feature models significantly more applicable to streaming data by imbuing them with the ability to create new features, online, in a probabilistic and principled manner. To achieve this, we derive a novel recursive form of the Indian Buffet Process, which we term the Recursive IBP (R-IBP). We demonstrate that R-IBP can be be used as a prior for feature models to efficiently infer a posterior over an unbounded number of latent features, with quasilinear average time complexity and logarithmic average space complexity. We compare R-IBP to existing offline sampling and variational baselines in two feature models (Linear Gaussian and Factor Analysis) and demonstrate on synthetic and real data that R-IBP achieves comparable or better performance in significantly less time.' volume: 162 URL: https://proceedings.mlr.press/v162/schaeffer22a.html PDF: https://proceedings.mlr.press/v162/schaeffer22a/schaeffer22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-schaeffer22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Rylan family: Schaeffer - given: Yilun family: Du - given: Gabrielle K family: Liu - given: Ila family: Fiete editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19366-19387 id: schaeffer22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19366 lastpage: 19387 published: 2022-06-28 00:00:00 +0000 - title: 'Modeling Irregular Time Series with Continuous Recurrent Units' abstract: 'Recurrent neural networks (RNNs) are a popular choice for modeling sequential data. Modern RNN architectures assume constant time-intervals between observations. However, in many datasets (e.g. medical records) observation times are irregular and can carry important information. To address this challenge, we propose continuous recurrent units (CRUs) {–} a neural architecture that can naturally handle irregular intervals between observations. The CRU assumes a hidden state, which evolves according to a linear stochastic differential equation and is integrated into an encoder-decoder framework. The recursive computations of the CRU can be derived using the continuous-discrete Kalman filter and are in closed form. The resulting recurrent architecture has temporal continuity between hidden states and a gating mechanism that can optimally integrate noisy observations. We derive an efficient parameterization scheme for the CRU that leads to a fast implementation f-CRU. We empirically study the CRU on a number of challenging datasets and find that it can interpolate irregular time series better than methods based on neural ordinary differential equations.' volume: 162 URL: https://proceedings.mlr.press/v162/schirmer22a.html PDF: https://proceedings.mlr.press/v162/schirmer22a/schirmer22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-schirmer22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mona family: Schirmer - given: Mazin family: Eltayeb - given: Stefan family: Lessmann - given: Maja family: Rudolph editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19388-19405 id: schirmer22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19388 lastpage: 19405 published: 2022-06-28 00:00:00 +0000 - title: 'Structure Preserving Neural Networks: A Case Study in the Entropy Closure of the Boltzmann Equation' abstract: 'In this paper, we explore applications of deep learning in statistical physics. We choose the Boltzmann equation as a typical example, where neural networks serve as a closure to its moment system. We present two types of neural networks to embed the convexity of entropy and to preserve the minimum entropy principle and intrinsic mathematical structures of the moment system of the Boltzmann equation. We derive an error bound for the generalization gap of convex neural networks which are trained in Sobolev norm and use the results to construct data sampling methods for neural network training. Numerical experiments demonstrate that the neural entropy closure is significantly faster than classical optimizers while maintaining sufficient accuracy.' volume: 162 URL: https://proceedings.mlr.press/v162/schotthofer22a.html PDF: https://proceedings.mlr.press/v162/schotthofer22a/schotthofer22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-schotthofer22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Steffen family: Schotthöfer - given: Tianbai family: Xiao - given: Martin family: Frank - given: Cory family: Hauck editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19406-19433 id: schotthofer22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19406 lastpage: 19433 published: 2022-06-28 00:00:00 +0000 - title: 'Improving Robustness against Real-World and Worst-Case Distribution Shifts through Decision Region Quantification' abstract: 'The reliability of neural networks is essential for their use in safety-critical applications. Existing approaches generally aim at improving the robustness of neural networks to either real-world distribution shifts (e.g., common corruptions and perturbations, spatial transformations, and natural adversarial examples) or worst-case distribution shifts (e.g., optimized adversarial examples). In this work, we propose the Decision Region Quantification (DRQ) algorithm to improve the robustness of any differentiable pre-trained model against both real-world and worst-case distribution shifts in the data. DRQ analyzes the robustness of local decision regions in the vicinity of a given data point to make more reliable predictions. We theoretically motivate the DRQ algorithm by showing that it effectively smooths spurious local extrema in the decision surface. Furthermore, we propose an implementation using targeted and untargeted adversarial attacks. An extensive empirical evaluation shows that DRQ increases the robustness of adversarially and non-adversarially trained models against real-world and worst-case distribution shifts on several computer vision benchmark datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/schwinn22a.html PDF: https://proceedings.mlr.press/v162/schwinn22a/schwinn22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-schwinn22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Leo family: Schwinn - given: Leon family: Bungert - given: An family: Nguyen - given: René family: Raab - given: Falk family: Pulsmeyer - given: Doina family: Precup - given: Bjoern family: Eskofier - given: Dario family: Zanca editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19434-19449 id: schwinn22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19434 lastpage: 19449 published: 2022-06-28 00:00:00 +0000 - title: 'Symmetric Machine Theory of Mind' abstract: 'Theory of mind, the ability to model others’ thoughts and desires, is a cornerstone of human social intelligence. This makes it an important challenge for the machine learning community, but previous works mainly attempt to design agents that model the "mental state" of others as passive observers or in specific predefined roles, such as in speaker-listener scenarios. In contrast, we propose to model machine theory of mind in a more general symmetric scenario. We introduce a multi-agent environment SymmToM where, like in real life, all agents can speak, listen, see other agents, and move freely through the world. Effective strategies to maximize an agent’s reward require it to develop a theory of mind. We show that reinforcement learning agents that model the mental states of others achieve significant performance improvements over agents with no such theory of mind model. Importantly, our best agents still fail to achieve performance comparable to agents with access to the gold-standard mental state of other agents, demonstrating that the modeling of theory of mind in multi-agent scenarios is very much an open challenge.' volume: 162 URL: https://proceedings.mlr.press/v162/sclar22a.html PDF: https://proceedings.mlr.press/v162/sclar22a/sclar22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-sclar22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Melanie family: Sclar - given: Graham family: Neubig - given: Yonatan family: Bisk editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19450-19466 id: sclar22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19450 lastpage: 19466 published: 2022-06-28 00:00:00 +0000 - title: 'Data-SUITE: Data-centric identification of in-distribution incongruous examples' abstract: 'Systematic quantification of data quality is critical for consistent model performance. Prior works have focused on out-of-distribution data. Instead, we tackle an understudied yet equally important problem of characterizing incongruous regions of in-distribution (ID) data, which may arise from feature space heterogeneity. To this end, we propose a paradigm shift with Data-SUITE: a data-centric AI framework to identify these regions, independent of a task-specific model. Data-SUITE leverages copula modeling, representation learning, and conformal prediction to build feature-wise confidence interval estimators based on a set of training instances. These estimators can be used to evaluate the congruence of test instances with respect to the training set, to answer two practically useful questions: (1) which test instances will be reliably predicted by a model trained with the training instances? and (2) can we identify incongruous regions of the feature space so that data owners understand the data’s limitations or guide future data collection? We empirically validate Data-SUITE’s performance and coverage guarantees and demonstrate on cross-site medical data, biased data, and data with concept drift, that Data-SUITE best identifies ID regions where a downstream model may be reliable (independent of said model). We also illustrate how these identified regions can provide insights into datasets and highlight their limitations.' volume: 162 URL: https://proceedings.mlr.press/v162/seedat22a.html PDF: https://proceedings.mlr.press/v162/seedat22a/seedat22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-seedat22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Nabeel family: Seedat - given: Jonathan family: Crabbé - given: Mihaela prefix: van der family: Schaar editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19467-19496 id: seedat22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19467 lastpage: 19496 published: 2022-06-28 00:00:00 +0000 - title: 'Continuous-Time Modeling of Counterfactual Outcomes Using Neural Controlled Differential Equations' abstract: 'Estimating counterfactual outcomes over time has the potential to unlock personalized healthcare by assisting decision-makers to answer "what-if" questions. Existing causal inference approaches typically consider regular, discrete-time intervals between observations and treatment decisions and hence are unable to naturally model irregularly sampled data, which is the common setting in practice. To handle arbitrary observation patterns, we interpret the data as samples from an underlying continuous-time process and propose to model its latent trajectory explicitly using the mathematics of controlled differential equations. This leads to a new approach, the Treatment Effect Neural Controlled Differential Equation (TE-CDE), that allows the potential outcomes to be evaluated at any time point. In addition, adversarial training is used to adjust for time-dependent confounding which is critical in longitudinal settings and is an added challenge not encountered in conventional time series. To assess solutions to this problem, we propose a controllable simulation environment based on a model of tumor growth for a range of scenarios with irregular sampling reflective of a variety of clinical scenarios. TE-CDE consistently outperforms existing approaches in all scenarios with irregular sampling.' volume: 162 URL: https://proceedings.mlr.press/v162/seedat22b.html PDF: https://proceedings.mlr.press/v162/seedat22b/seedat22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-seedat22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Nabeel family: Seedat - given: Fergus family: Imrie - given: Alexis family: Bellot - given: Zhaozhi family: Qian - given: Mihaela prefix: van der family: Schaar editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19497-19521 id: seedat22b issued: date-parts: - 2022 - 6 - 28 firstpage: 19497 lastpage: 19521 published: 2022-06-28 00:00:00 +0000 - title: 'Neural Tangent Kernel Beyond the Infinite-Width Limit: Effects of Depth and Initialization' abstract: 'Neural Tangent Kernel (NTK) is widely used to analyze overparametrized neural networks due to the famous result by Jacot et al. (2018): in the infinite-width limit, the NTK is deterministic and constant during training. However, this result cannot explain the behavior of deep networks, since it generally does not hold if depth and width tend to infinity simultaneously. In this paper, we study the NTK of fully-connected ReLU networks with depth comparable to width. We prove that the NTK properties depend significantly on the depth-to-width ratio and the distribution of parameters at initialization. In fact, our results indicate the importance of the three phases in the hyperparameter space identified in Poole et al. (2016): ordered, chaotic and the edge of chaos (EOC). We derive exact expressions for the NTK dispersion in the infinite-depth-and-width limit in all three phases and conclude that the NTK variability grows exponentially with depth at the EOC and in the chaotic phase but not in the ordered phase. We also show that the NTK of deep networks may stay constant during training only in the ordered phase and discuss how the structure of the NTK matrix changes during training.' volume: 162 URL: https://proceedings.mlr.press/v162/seleznova22a.html PDF: https://proceedings.mlr.press/v162/seleznova22a/seleznova22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-seleznova22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mariia family: Seleznova - given: Gitta family: Kutyniok editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19522-19560 id: seleznova22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19522 lastpage: 19560 published: 2022-06-28 00:00:00 +0000 - title: 'Reinforcement Learning with Action-Free Pre-Training from Videos' abstract: 'Recent unsupervised pre-training methods have shown to be effective on language and vision domains by learning useful representations for multiple downstream tasks. In this paper, we investigate if such unsupervised pre-training methods can also be effective for vision-based reinforcement learning (RL). To this end, we introduce a framework that learns representations useful for understanding the dynamics via generative pre-training on videos. Our framework consists of two phases: we pre-train an action-free latent video prediction model, and then utilize the pre-trained representations for efficiently learning action-conditional world models on unseen environments. To incorporate additional action inputs during fine-tuning, we introduce a new architecture that stacks an action-conditional latent prediction model on top of the pre-trained action-free prediction model. Moreover, for better exploration, we propose a video-based intrinsic bonus that leverages pre-trained representations. We demonstrate that our framework significantly improves both final performances and sample-efficiency of vision-based RL in a variety of manipulation and locomotion tasks. Code is available at \url{https://github.com/younggyoseo/apv}.' volume: 162 URL: https://proceedings.mlr.press/v162/seo22a.html PDF: https://proceedings.mlr.press/v162/seo22a/seo22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-seo22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Younggyo family: Seo - given: Kimin family: Lee - given: Stephen L family: James - given: Pieter family: Abbeel editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19561-19579 id: seo22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19561 lastpage: 19579 published: 2022-06-28 00:00:00 +0000 - title: 'Efficient Model-based Multi-agent Reinforcement Learning via Optimistic Equilibrium Computation' abstract: 'We consider model-based multi-agent reinforcement learning, where the environment transition model is unknown and can only be learned via expensive interactions with the environment. We propose H-MARL (Hallucinated Multi-Agent Reinforcement Learning), a novel sample-efficient algorithm that can efficiently balance exploration, i.e., learning about the environment, and exploitation, i.e., achieve good equilibrium performance in the underlying general-sum Markov game. H-MARL builds high-probability confidence intervals around the unknown transition model and sequentially updates them based on newly observed data. Using these, it constructs an optimistic hallucinated game for the agents for which equilibrium policies are computed at each round. We consider general statistical models (e.g., Gaussian processes, deep ensembles, etc.) and policy classes (e.g., deep neural networks), and theoretically analyze our approach by bounding the agents’ dynamic regret. Moreover, we provide a convergence rate to the equilibria of the underlying Markov game. We demonstrate our approach experimentally on an autonomous driving simulation benchmark. H-MARL learns successful equilibrium policies after a few interactions with the environment and can significantly improve the performance compared to non-optimistic exploration methods.' volume: 162 URL: https://proceedings.mlr.press/v162/sessa22a.html PDF: https://proceedings.mlr.press/v162/sessa22a/sessa22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-sessa22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Pier Giuseppe family: Sessa - given: Maryam family: Kamgarpour - given: Andreas family: Krause editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19580-19597 id: sessa22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19580 lastpage: 19597 published: 2022-06-28 00:00:00 +0000 - title: 'Selective Regression under Fairness Criteria' abstract: 'Selective regression allows abstention from prediction if the confidence to make an accurate prediction is not sufficient. In general, by allowing a reject option, one expects the performance of a regression model to increase at the cost of reducing coverage (i.e., by predicting on fewer samples). However, as we show, in some cases, the performance of a minority subgroup can decrease while we reduce the coverage, and thus selective regression can magnify disparities between different sensitive subgroups. Motivated by these disparities, we propose new fairness criteria for selective regression requiring the performance of every subgroup to improve with a decrease in coverage. We prove that if a feature representation satisfies the sufficiency criterion or is calibrated for mean and variance, then the proposed fairness criteria is met. Further, we introduce two approaches to mitigate the performance disparity across subgroups: (a) by regularizing an upper bound of conditional mutual information under a Gaussian assumption and (b) by regularizing a contrastive loss for conditional mean and conditional variance prediction. The effectiveness of these approaches is demonstrated on synthetic and real-world datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/shah22a.html PDF: https://proceedings.mlr.press/v162/shah22a/shah22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-shah22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Abhin family: Shah - given: Yuheng family: Bu - given: Joshua K family: Lee - given: Subhro family: Das - given: Rameswar family: Panda - given: Prasanna family: Sattigeri - given: Gregory W family: Wornell editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19598-19615 id: shah22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19598 lastpage: 19615 published: 2022-06-28 00:00:00 +0000 - title: 'Utility Theory for Sequential Decision Making' abstract: 'The von Neumann-Morgenstern (VNM) utility theorem shows that under certain axioms of rationality, decision-making is reduced to maximizing the expectation of some utility function. We extend these axioms to increasingly structured sequential decision making settings and identify the structure of the corresponding utility functions. In particular, we show that memoryless preferences lead to a utility in the form of a per transition reward and multiplicative factor on the future return. This result motivates a generalization of Markov Decision Processes (MDPs) with this structure on the agent’s returns, which we call Affine-Reward MDPs. A stronger constraint on preferences is needed to recover the commonly used cumulative sum of scalar rewards in MDPs. A yet stronger constraint simplifies the utility function for goal-seeking agents in the form of a difference in some function of states that we call potential functions. Our necessary and sufficient conditions demystify the reward hypothesis that underlies the design of rational agents in reinforcement learning by adding an axiom to the VNM rationality axioms and motivates new directions for AI research involving sequential decision making.' volume: 162 URL: https://proceedings.mlr.press/v162/shakerinava22a.html PDF: https://proceedings.mlr.press/v162/shakerinava22a/shakerinava22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-shakerinava22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mehran family: Shakerinava - given: Siamak family: Ravanbakhsh editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19616-19625 id: shakerinava22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19616 lastpage: 19625 published: 2022-06-28 00:00:00 +0000 - title: 'Translating Robot Skills: Learning Unsupervised Skill Correspondences Across Robots' abstract: 'In this paper, we explore how we can endow robots with the ability to learn correspondences between their own skills, and those of morphologically different robots in different domains, in an entirely unsupervised manner. We make the insight that different morphological robots use similar task strategies to solve similar tasks. Based on this insight, we frame learning skill correspondences as a problem of matching distributions of sequences of skills across robots. We then present an unsupervised objective that encourages a learnt skill translation model to match these distributions across domains, inspired by recent advances in unsupervised machine translation. Our approach is able to learn semantically meaningful correspondences between skills across multiple robot-robot and human-robot domain pairs despite being completely unsupervised. Further, the learnt correspondences enable the transfer of task strategies across robots and domains. We present dynamic visualizations of our results at https://sites.google.com/view/translatingrobotskills/home.' volume: 162 URL: https://proceedings.mlr.press/v162/shankar22a.html PDF: https://proceedings.mlr.press/v162/shankar22a/shankar22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-shankar22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tanmay family: Shankar - given: Yixin family: Lin - given: Aravind family: Rajeswaran - given: Vikash family: Kumar - given: Stuart family: Anderson - given: Jean family: Oh editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19626-19644 id: shankar22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19626 lastpage: 19644 published: 2022-06-28 00:00:00 +0000 - title: 'A State-Distribution Matching Approach to Non-Episodic Reinforcement Learning' abstract: 'While reinforcement learning (RL) provides a framework for learning through trial and error, translating RL algorithms into the real world has remained challenging. A major hurdle to real-world application arises from the development of algorithms in an episodic setting where the environment is reset after every trial, in contrast with the continual and non-episodic nature of the real-world encountered by embodied agents such as humans and robots. Enabling agents to learn behaviors autonomously in such non-episodic environments requires that the agent to be able to conduct its own trials. Prior works have considered an alternating approach where a forward policy learns to solve the task and the backward policy learns to reset the environment, but what initial state distribution should the backward policy reset the agent to? Assuming access to a few demonstrations, we propose a new method, MEDAL, that trains the backward policy to match the state distribution in the provided demonstrations. This keeps the agent close to the task-relevant states, allowing for a mix of easy and difficult starting states for the forward policy. Our experiments show that MEDAL matches or outperforms prior methods on three sparse-reward continuous control tasks from the EARL benchmark, with 40% gains on the hardest task, while making fewer assumptions than prior works.' volume: 162 URL: https://proceedings.mlr.press/v162/sharma22a.html PDF: https://proceedings.mlr.press/v162/sharma22a/sharma22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-sharma22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Archit family: Sharma - given: Rehaan family: Ahmad - given: Chelsea family: Finn editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19645-19657 id: sharma22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19645 lastpage: 19657 published: 2022-06-28 00:00:00 +0000 - title: 'Content Addressable Memory Without Catastrophic Forgetting by Heteroassociation with a Fixed Scaffold' abstract: 'Content-addressable memory (CAM) networks, so-called because stored items can be recalled by partial or corrupted versions of the items, exhibit near-perfect recall of a small number of information-dense patterns below capacity and a ’memory cliff’ beyond, such that inserting a single additional pattern results in catastrophic loss of all stored patterns. We propose a novel CAM architecture, Memory Scaffold with Heteroassociation (MESH), that factorizes the problems of internal attractor dynamics and association with external content to generate a CAM continuum without a memory cliff: Small numbers of patterns are stored with complete information recovery matching standard CAMs, while inserting more patterns still results in partial recall of every pattern, with a graceful trade-off between pattern number and pattern richness. Motivated by the architecture of the Entorhinal-Hippocampal memory circuit in the brain, MESH is a tripartite architecture with pairwise interactions that uses a predetermined set of internally stabilized states together with heteroassociation between the internal states and arbitrary external patterns. We show analytically and experimentally that for any number of stored patterns, MESH nearly saturates the total information bound (given by the number of synapses) for CAM networks, outperforming all existing CAM models.' volume: 162 URL: https://proceedings.mlr.press/v162/sharma22b.html PDF: https://proceedings.mlr.press/v162/sharma22b/sharma22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-sharma22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sugandha family: Sharma - given: Sarthak family: Chandra - given: Ila family: Fiete editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19658-19682 id: sharma22b issued: date-parts: - 2022 - 6 - 28 firstpage: 19658 lastpage: 19682 published: 2022-06-28 00:00:00 +0000 - title: 'Federated Minimax Optimization: Improved Convergence Analyses and Algorithms' abstract: 'In this paper, we consider nonconvex minimax optimization, which is gaining prominence in many modern machine learning applications, such as GANs. Large-scale edge-based collection of training data in these applications calls for communication-efficient distributed optimization algorithms, such as those used in federated learning, to process the data. In this paper, we analyze local stochastic gradient descent ascent (SGDA), the local-update version of the SGDA algorithm. SGDA is the core algorithm used in minimax optimization, but it is not well-understood in a distributed setting. We prove that Local SGDA has order-optimal sample complexity for several classes of nonconvex-concave and nonconvex-nonconcave minimax problems, and also enjoys linear speedup with respect to the number of clients. We provide a novel and tighter analysis, which improves the convergence and communication guarantees in the existing literature. For nonconvex-PL and nonconvex-one-point-concave functions, we improve the existing complexity results for centralized minimax problems. Furthermore, we propose a momentum-based local-update algorithm, which has the same convergence guarantees, but outperforms Local SGDA as demonstrated in our experiments.' volume: 162 URL: https://proceedings.mlr.press/v162/sharma22c.html PDF: https://proceedings.mlr.press/v162/sharma22c/sharma22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-sharma22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Pranay family: Sharma - given: Rohan family: Panda - given: Gauri family: Joshi - given: Pramod family: Varshney editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19683-19730 id: sharma22c issued: date-parts: - 2022 - 6 - 28 firstpage: 19683 lastpage: 19730 published: 2022-06-28 00:00:00 +0000 - title: 'DNS: Determinantal Point Process Based Neural Network Sampler for Ensemble Reinforcement Learning' abstract: 'The application of an ensemble of neural networks is becoming an imminent tool for advancing state-of-the-art deep reinforcement learning algorithms. However, training these large numbers of neural networks in the ensemble has an exceedingly high computation cost which may become a hindrance in training large-scale systems. In this paper, we propose DNS: a Determinantal Point Process based Neural Network Sampler that specifically uses k-DPP to sample a subset of neural networks for backpropagation at every training step thus significantly reducing the training time and computation cost. We integrated DNS in REDQ for continuous control tasks and evaluated on MuJoCo environments. Our experiments show that DNS augmented REDQ matches the baseline REDQ in terms of average cumulative reward and achieves this using less than 50% computation when measured in FLOPS. The code is available at https://github.com/IntelLabs/DNS' volume: 162 URL: https://proceedings.mlr.press/v162/sheikh22a.html PDF: https://proceedings.mlr.press/v162/sheikh22a/sheikh22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-sheikh22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hassam family: Sheikh - given: Kizza family: Frisbee - given: Mariano family: Phielipp editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19731-19746 id: sheikh22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19731 lastpage: 19746 published: 2022-06-28 00:00:00 +0000 - title: 'Instance Dependent Regret Analysis of Kernelized Bandits' abstract: 'We study the problem of designing an adaptive strategy for querying a noisy zeroth-order-oracle to efficiently learn about the optimizer of an unknown function $f$. To make the problem tractable, we assume that $f$ lies in the reproducing kernel Hilbert space (RKHS) associated with a known kernel $K$, with its norm bounded by $M<\infty$. Prior results, working in a minimax framework, have characterized the worst-case (over all functions in the problem class) limits on regret achievable by any algorithm, and have constructed algorithms with matching (modulo polylogarithmic factors) worst-case performance for the Matern family of kernels. These results suffer from two drawbacks. First, the minimax lower bound gives limited information about the limits of regret achievable by commonly used algorithms on a specific problem instance $f$. Second, the existing upper bound analysis fails to adapt to easier problem instances within the function class. Our work takes steps to address both these issues. First, we derive instance-dependent regret lower bounds for algorithms with uniformly (over the function class) vanishing normalized cumulative regret. Our result, valid for several practically relevant kernelized bandits algorithms, such as, GP-UCB, GP-TS and SupKernelUCB, identifies a fundamental complexity measure associated with every problem instance. We then address the second issue, by proposing a new minimax near-optimal algorithm that also adapts to easier problem instances.' volume: 162 URL: https://proceedings.mlr.press/v162/shekhar22a.html PDF: https://proceedings.mlr.press/v162/shekhar22a/shekhar22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-shekhar22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Shubhanshu family: Shekhar - given: Tara family: Javidi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19747-19772 id: shekhar22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19747 lastpage: 19772 published: 2022-06-28 00:00:00 +0000 - title: 'Data Augmentation as Feature Manipulation' abstract: 'Data augmentation is a cornerstone of the machine learning pipeline, yet its theoretical underpinnings remain unclear. Is it merely a way to artificially augment the data set size? Or is it about encouraging the model to satisfy certain invariances? In this work we consider another angle, and we study the effect of data augmentation on the dynamic of the learning process. We find that data augmentation can alter the relative importance of various features, effectively making certain informative but hard to learn features more likely to be captured in the learning process. Importantly, we show that this effect is more pronounced for non-linear models, such as neural networks. Our main contribution is a detailed analysis of data augmentation on the learning dynamic for a two layer convolutional neural network in the recently proposed multi-view model by Z. Allen-Zhu and Y. Li. We complement this analysis with further experimental evidence that data augmentation can be viewed as a form of feature manipulation.' volume: 162 URL: https://proceedings.mlr.press/v162/shen22a.html PDF: https://proceedings.mlr.press/v162/shen22a/shen22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-shen22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ruoqi family: Shen - given: Sebastien family: Bubeck - given: Suriya family: Gunasekar editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19773-19808 id: shen22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19773 lastpage: 19808 published: 2022-06-28 00:00:00 +0000 - title: 'Metric-Fair Active Learning' abstract: 'Active learning has become a prevalent technique for designing label-efficient algorithms, where the central principle is to only query and fit “informative” labeled instances. It is, however, known that an active learning algorithm may incur unfairness due to such instance selection procedure. In this paper, we henceforth study metric-fair active learning of homogeneous halfspaces, and show that under the distribution-dependent PAC learning model, fairness and label efficiency can be achieved simultaneously. We further propose two extensions of our main results: 1) we show that it is possible to make the algorithm robust to the adversarial noise – one of the most challenging noise models in learning theory; and 2) it is possible to significantly improve the label complexity when the underlying halfspace is sparse.' volume: 162 URL: https://proceedings.mlr.press/v162/shen22b.html PDF: https://proceedings.mlr.press/v162/shen22b/shen22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-shen22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jie family: Shen - given: Nan family: Cui - given: Jing family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19809-19826 id: shen22b issued: date-parts: - 2022 - 6 - 28 firstpage: 19809 lastpage: 19826 published: 2022-06-28 00:00:00 +0000 - title: 'PDO-s3DCNNs: Partial Differential Operator Based Steerable 3D CNNs' abstract: 'Steerable models can provide very general and flexible equivariance by formulating equivariance requirements in the language of representation theory and feature fields, which has been recognized to be effective for many vision tasks. However, deriving steerable models for 3D rotations is much more difficult than that in the 2D case, due to more complicated mathematics of 3D rotations. In this work, we employ partial differential operators (PDOs) to model 3D filters, and derive general steerable 3D CNNs, which are called PDO-s3DCNNs. We prove that the equivariant filters are subject to linear constraints, which can be solved efficiently under various conditions. As far as we know, PDO-s3DCNNs are the most general steerable CNNs for 3D rotations, in the sense that they cover all common subgroups of SO(3) and their representations, while existing methods can only be applied to specific groups and representations. Extensive experiments show that our models can preserve equivariance well in the discrete domain, and outperform previous works on SHREC’17 retrieval and ISBI 2012 segmentation tasks with a low network complexity.' volume: 162 URL: https://proceedings.mlr.press/v162/shen22c.html PDF: https://proceedings.mlr.press/v162/shen22c/shen22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-shen22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhengyang family: Shen - given: Tao family: Hong - given: Qi family: She - given: Jinwen family: Ma - given: Zhouchen family: Lin editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19827-19846 id: shen22c issued: date-parts: - 2022 - 6 - 28 firstpage: 19827 lastpage: 19846 published: 2022-06-28 00:00:00 +0000 - title: 'Connect, Not Collapse: Explaining Contrastive Learning for Unsupervised Domain Adaptation' abstract: 'We consider unsupervised domain adaptation (UDA), where labeled data from a source domain (e.g., photos) and unlabeled data from a target domain (e.g., sketches) are used to learn a classifier for the target domain. Conventional UDA methods (e.g., domain adversarial training) learn domain-invariant features to generalize from the source domain to the target domain. In this paper, we show that contrastive pre-training, which learns features on unlabeled source and target data and then fine-tunes on labeled source data, is competitive with strong UDA methods. However, we find that contrastive pre-training does not learn domain-invariant features, diverging from conventional UDA intuitions. We show theoretically that contrastive pre-training can learn features that vary subtantially across domains but still generalize to the target domain, by disentangling domain and class information. We empirically validate our theory on benchmark vision datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/shen22d.html PDF: https://proceedings.mlr.press/v162/shen22d/shen22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-shen22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Kendrick family: Shen - given: Robbie M family: Jones - given: Ananya family: Kumar - given: Sang Michael family: Xie - given: Jeff Z. family: Haochen - given: Tengyu family: Ma - given: Percy family: Liang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19847-19878 id: shen22d issued: date-parts: - 2022 - 6 - 28 firstpage: 19847 lastpage: 19878 published: 2022-06-28 00:00:00 +0000 - title: 'Constrained Optimization with Dynamic Bound-scaling for Effective NLP Backdoor Defense' abstract: 'Modern language models are vulnerable to backdoor attacks. An injected malicious token sequence (i.e., a trigger) can cause the compromised model to misbehave, raising security concerns. Trigger inversion is a widely-used technique for scanning backdoors in vision models. It can- not be directly applied to NLP models due to their discrete nature. In this paper, we develop a novel optimization method for NLP backdoor inversion. We leverage a dynamically reducing temperature coefficient in the softmax function to provide changing loss landscapes to the optimizer such that the process gradually focuses on the ground truth trigger, which is denoted as a one-hot value in a convex hull. Our method also features a temperature rollback mechanism to step away from local optimals, exploiting the observation that local optimals can be easily determined in NLP trigger inversion (while not in general optimization). We evaluate the technique on over 1600 models (with roughly half of them having injected backdoors) on 3 prevailing NLP tasks, with 4 different backdoor attacks and 7 architectures. Our results show that the technique is able to effectively and efficiently detect and remove backdoors, outperforming 5 baseline methods. The code is available at https: //github.com/PurduePAML/DBS.' volume: 162 URL: https://proceedings.mlr.press/v162/shen22e.html PDF: https://proceedings.mlr.press/v162/shen22e/shen22e.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-shen22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Guangyu family: Shen - given: Yingqi family: Liu - given: Guanhong family: Tao - given: Qiuling family: Xu - given: Zhuo family: Zhang - given: Shengwei family: An - given: Shiqing family: Ma - given: Xiangyu family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19879-19892 id: shen22e issued: date-parts: - 2022 - 6 - 28 firstpage: 19879 lastpage: 19892 published: 2022-06-28 00:00:00 +0000 - title: 'Staged Training for Transformer Language Models' abstract: 'The current standard approach to scaling transformer language models trains each model size from a different random initialization. As an alternative, we consider a staged training setup that begins with a small model and incrementally increases the amount of compute used for training by applying a "growth operator" to increase the model depth and width. By initializing each stage with the output of the previous one, the training process effectively re-uses the compute from prior stages and becomes more efficient. Our growth operators each take as input the entire training state (including model parameters, optimizer state, learning rate schedule, etc.) and output a new training state from which training continues. We identify two important properties of these growth operators, namely that they preserve both the loss and the “training dynamics” after applying the operator. While the loss-preserving property has been discussed previously, to the best of our knowledge this work is the first to identify the importance of preserving the training dynamics (the rate of decrease of the loss during training). To find the optimal schedule for stages, we use the scaling laws from (Kaplan et al., 2020) to find a precise schedule that gives the most compute saving by starting a new stage when training efficiency starts decreasing. We empirically validate our growth operators and staged training for autoregressive language models, showing up to 22% compute savings compared to a strong baseline trained from scratch. Our code is available at https://github.com/allenai/staged-training.' volume: 162 URL: https://proceedings.mlr.press/v162/shen22f.html PDF: https://proceedings.mlr.press/v162/shen22f/shen22f.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-shen22f.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sheng family: Shen - given: Pete family: Walsh - given: Kurt family: Keutzer - given: Jesse family: Dodge - given: Matthew family: Peters - given: Iz family: Beltagy editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19893-19908 id: shen22f issued: date-parts: - 2022 - 6 - 28 firstpage: 19893 lastpage: 19908 published: 2022-06-28 00:00:00 +0000 - title: 'Deep Network Approximation in Terms of Intrinsic Parameters' abstract: 'One of the arguments to explain the success of deep learning is the powerful approximation capacity of deep neural networks. Such capacity is generally accompanied by the explosive growth of the number of parameters, which, in turn, leads to high computational costs. It is of great interest to ask whether we can achieve successful deep learning with a small number of learnable parameters adapting to the target function. From an approximation perspective, this paper shows that the number of parameters that need to be learned can be significantly smaller than people typically expect. First, we theoretically design ReLU networks with a few learnable parameters to achieve an attractive approximation. We prove by construction that, for any Lipschitz continuous function $f$ on $[0,1]^d$ with a Lipschitz constant $\lambda>0$, a ReLU network with $n+2$ intrinsic parameters (those depending on $f$) can approximate $f$ with an exponentially small error $5 \lambda \sqrt{d} \, 2^{-n}$. Such a result is generalized to generic continuous functions. Furthermore, we show that the idea of learning a small number of parameters to achieve a good approximation can be numerically observed. We conduct several experiments to verify that training a small part of parameters can also achieve good results for classification problems if other parameters are pre-specified or pre-trained from a related problem.' volume: 162 URL: https://proceedings.mlr.press/v162/shen22g.html PDF: https://proceedings.mlr.press/v162/shen22g/shen22g.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-shen22g.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zuowei family: Shen - given: Haizhao family: Yang - given: Shijun family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19909-19934 id: shen22g issued: date-parts: - 2022 - 6 - 28 firstpage: 19909 lastpage: 19934 published: 2022-06-28 00:00:00 +0000 - title: 'Gradient-Free Method for Heavily Constrained Nonconvex Optimization' abstract: 'Zeroth-order (ZO) method has been shown to be a powerful method for solving the optimization problem where explicit expression of the gradients is difficult or infeasible to obtain. Recently, due to the practical value of the constrained problems, a lot of ZO Frank-Wolfe or projected ZO methods have been proposed. However, in many applications, we may have a very large number of nonconvex white/black-box constraints, which makes the existing zeroth-order methods extremely inefficient (or even not working) since they need to inquire function value of all the constraints and project the solution to the complicated feasible set. In this paper, to solve the nonconvex problem with a large number of white/black-box constraints, we proposed a doubly stochastic zeroth-order gradient method (DSZOG) with momentum method and adaptive step size. Theoretically, we prove DSZOG can converge to the $\epsilon$-stationary point of the constrained problem. Experimental results in two applications demonstrate the superiority of our method in terms of training time and accuracy compared with other ZO methods for the constrained problem.' volume: 162 URL: https://proceedings.mlr.press/v162/shi22a.html PDF: https://proceedings.mlr.press/v162/shi22a/shi22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-shi22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Wanli family: Shi - given: Hongchang family: Gao - given: Bin family: Gu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19935-19955 id: shi22a issued: date-parts: - 2022 - 6 - 28 firstpage: 19935 lastpage: 19955 published: 2022-06-28 00:00:00 +0000 - title: 'Global Optimization of K-Center Clustering' abstract: '$k$-center problem is a well-known clustering method and can be formulated as a mixed-integer nonlinear programming problem. This work provides a practical global optimization algorithm for this task based on a reduced-space spatial branch and bound scheme. This algorithm can guarantee convergence to the global optimum by only branching on the centers of clusters, which is independent of the dataset’s cardinality. In addition, a set of feasibility-based bounds tightening techniques are proposed to narrow down the domain of centers and significantly accelerate the convergence. To demonstrate the capacity of this algorithm, we present computational results on 32 datasets. Notably, for the dataset with 14 million samples and 3 features, the serial implementation of the algorithm can converge to an optimality gap of 0.1% within 2 hours. Compared with a heuristic method, the global optimum obtained by our algorithm can reduce the objective function on average by 30.4%.' volume: 162 URL: https://proceedings.mlr.press/v162/shi22b.html PDF: https://proceedings.mlr.press/v162/shi22b/shi22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-shi22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mingfei family: Shi - given: Kaixun family: Hua - given: Jiayang family: Ren - given: Yankai family: Cao editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19956-19966 id: shi22b issued: date-parts: - 2022 - 6 - 28 firstpage: 19956 lastpage: 19966 published: 2022-06-28 00:00:00 +0000 - title: 'Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity' abstract: 'Offline or batch reinforcement learning seeks to learn a near-optimal policy using history data without active exploration of the environment. To counter the insufficient coverage and sample scarcity of many offline datasets, the principle of pessimism has been recently introduced to mitigate high bias of the estimated values. While pessimistic variants of model-based algorithms (e.g., value iteration with lower confidence bounds) have been theoretically investigated, their model-free counterparts — which do not require explicit model estimation — have not been adequately studied, especially in terms of sample efficiency. To address this inadequacy, we study a pessimistic variant of Q-learning in the context of finite-horizon Markov decision processes, and characterize its sample complexity under the single-policy concentrability assumption which does not require the full coverage of the state-action space. In addition, a variance-reduced pessimistic Q-learning algorithm is proposed to achieve near-optimal sample complexity. Altogether, this work highlights the efficiency of model-free algorithms in offline RL when used in conjunction with pessimism and variance reduction.' volume: 162 URL: https://proceedings.mlr.press/v162/shi22c.html PDF: https://proceedings.mlr.press/v162/shi22c/shi22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-shi22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Laixi family: Shi - given: Gen family: Li - given: Yuting family: Wei - given: Yuxin family: Chen - given: Yuejie family: Chi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 19967-20025 id: shi22c issued: date-parts: - 2022 - 6 - 28 firstpage: 19967 lastpage: 20025 published: 2022-06-28 00:00:00 +0000 - title: 'Adversarial Masking for Self-Supervised Learning' abstract: 'We propose ADIOS, a masked image model (MIM) framework for self-supervised learning, which simultaneously learns a masking function and an image encoder using an adversarial objective. The image encoder is trained to minimise the distance between representations of the original and that of a masked image. The masking function, conversely, aims at maximising this distance. ADIOS consistently improves on state-of-the-art self-supervised learning (SSL) methods on a variety of tasks and datasets—including classification on ImageNet100 and STL10, transfer learning on CIFAR10/100, Flowers102 and iNaturalist, as well as robustness evaluated on the backgrounds challenge (Xiao et al., 2021)—while generating semantically meaningful masks. Unlike modern MIM models such as MAE, BEiT and iBOT, ADIOS does not rely on the image-patch tokenisation construction of Vision Transformers, and can be implemented with convolutional backbones. We further demonstrate that the masks learned by ADIOS are more effective in improving representation learning of SSL methods than masking schemes used in popular MIM models.' volume: 162 URL: https://proceedings.mlr.press/v162/shi22d.html PDF: https://proceedings.mlr.press/v162/shi22d/shi22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-shi22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yuge family: Shi - given: N family: Siddharth - given: Philip family: Torr - given: Adam R family: Kosiorek editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20026-20040 id: shi22d issued: date-parts: - 2022 - 6 - 28 firstpage: 20026 lastpage: 20040 published: 2022-06-28 00:00:00 +0000 - title: 'Visual Attention Emerges from Recurrent Sparse Reconstruction' abstract: 'Visual attention helps achieve robust perception under noise, corruption, and distribution shifts in human vision, which are areas where modern neural networks still fall short. We present VARS, Visual Attention from Recurrent Sparse reconstruction, a new attention formulation built on two prominent features of the human visual attention mechanism: recurrency and sparsity. Related features are grouped together via recurrent connections between neurons, with salient objects emerging via sparse regularization. VARS adopts an attractor network with recurrent connections that converges toward a stable pattern over time. Network layers are represented as ordinary differential equations (ODEs), formulating attention as a recurrent attractor network that equivalently optimizes the sparse reconstruction of input using a dictionary of “templates” encoding underlying patterns of data. We show that self-attention is a special case of VARS with a single-step optimization and no sparsity constraint. VARS can be readily used as a replacement for self-attention in popular vision transformers, consistently improving their robustness across various benchmarks.' volume: 162 URL: https://proceedings.mlr.press/v162/shi22e.html PDF: https://proceedings.mlr.press/v162/shi22e/shi22e.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-shi22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Baifeng family: Shi - given: Yale family: Song - given: Neel family: Joshi - given: Trevor family: Darrell - given: Xin family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20041-20056 id: shi22e issued: date-parts: - 2022 - 6 - 28 firstpage: 20041 lastpage: 20056 published: 2022-06-28 00:00:00 +0000 - title: 'A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes' abstract: 'We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes (POMDPs), where the evaluation policy depends only on observable variables and the behavior policy depends on unobservable latent variables. Existing works either assume no unmeasured confounders, or focus on settings where both the observation and the state spaces are tabular. In this work, we first propose novel identification methods for OPE in POMDPs with latent confounders, by introducing bridge functions that link the target policy’s value and the observed data distribution. We next propose minimax estimation methods for learning these bridge functions, and construct three estimators based on these estimated bridge functions, corresponding to a value function-based estimator, a marginalized importance sampling estimator, and a doubly-robust estimator. Our proposal permits general function approximation and is thus applicable to settings with continuous or large observation/state spaces. The nonasymptotic and asymptotic properties of the proposed estimators are investigated in detail. A Python implementation of our proposal is available at https://github.com/jiaweihhuang/ Confounded-POMDP-Exp.' volume: 162 URL: https://proceedings.mlr.press/v162/shi22f.html PDF: https://proceedings.mlr.press/v162/shi22f/shi22f.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-shi22f.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Chengchun family: Shi - given: Masatoshi family: Uehara - given: Jiawei family: Huang - given: Nan family: Jiang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20057-20094 id: shi22f issued: date-parts: - 2022 - 6 - 28 firstpage: 20057 lastpage: 20094 published: 2022-06-28 00:00:00 +0000 - title: 'Robust Group Synchronization via Quadratic Programming' abstract: 'We propose a novel quadratic programming formulation for estimating the corruption levels in group synchronization, and use these estimates to solve this problem. Our objective function exploits the cycle consistency of the group and we thus refer to our method as detection and estimation of structural consistency (DESC). This general framework can be extended to other algebraic and geometric structures. Our formulation has the following advantages: it can tolerate corruption as high as the information-theoretic bound, it does not require a good initialization for the estimates of group elements, it has a simple interpretation, and under some mild conditions the global minimum of our objective function exactly recovers the corruption levels. We demonstrate the competitive accuracy of our approach on both synthetic and real data experiments of rotation averaging.' volume: 162 URL: https://proceedings.mlr.press/v162/shi22g.html PDF: https://proceedings.mlr.press/v162/shi22g/shi22g.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-shi22g.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yunpeng family: Shi - given: Cole M family: Wyeth - given: Gilad family: Lerman editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20095-20105 id: shi22g issued: date-parts: - 2022 - 6 - 28 firstpage: 20095 lastpage: 20105 published: 2022-06-28 00:00:00 +0000 - title: 'Log-Euclidean Signatures for Intrinsic Distances Between Unaligned Datasets' abstract: 'The need for efficiently comparing and representing datasets with unknown alignment spans various fields, from model analysis and comparison in machine learning to trend discovery in collections of medical datasets. We use manifold learning to compare the intrinsic geometric structures of different datasets by comparing their diffusion operators, symmetric positive-definite (SPD) matrices that relate to approximations of the continuous Laplace-Beltrami operator from discrete samples. Existing methods typically assume known data alignment and compare such operators in a pointwise manner. Instead, we exploit the Riemannian geometry of SPD matrices to compare these operators and define a new theoretically-motivated distance based on a lower bound of the log-Euclidean metric. Our framework facilitates comparison of data manifolds expressed in datasets with different sizes, numbers of features, and measurement modalities. Our log-Euclidean signature (LES) distance recovers meaningful structural differences, outperforming competing methods in various application domains.' volume: 162 URL: https://proceedings.mlr.press/v162/shnitzer22a.html PDF: https://proceedings.mlr.press/v162/shnitzer22a/shnitzer22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-shnitzer22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tal family: Shnitzer - given: Mikhail family: Yurochkin - given: Kristjan family: Greenewald - given: Justin M family: Solomon editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20106-20124 id: shnitzer22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20106 lastpage: 20124 published: 2022-06-28 00:00:00 +0000 - title: 'Scalable Computation of Causal Bounds' abstract: 'We consider the problem of computing bounds for causal inference problems with unobserved confounders, where identifiability does not hold. Existing non-parametric approaches for computing such bounds use linear programming (LP) formulations that quickly become intractable for existing solvers because the size of the LP grows exponentially in the number of edges in the underlying causal graph. We show that this LP can be significantly pruned by carefully considering the structure of the causal query, allowing us to compute bounds for significantly larger causal inference problems as compared to what is possible using existing techniques. This pruning procedure also allows us to compute the bounds in closed form for a special class of causal graphs and queries, which includes a well-studied family of problems where multiple confounded treatments influence an outcome. We also propose a very efficient greedy heuristic that produces very high quality bounds, and scales to problems that are several orders of magnitude larger than those for which the pruned LP can be solved.' volume: 162 URL: https://proceedings.mlr.press/v162/shridharan22a.html PDF: https://proceedings.mlr.press/v162/shridharan22a/shridharan22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-shridharan22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Madhumitha family: Shridharan - given: Garud family: Iyengar editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20125-20140 id: shridharan22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20125 lastpage: 20140 published: 2022-06-28 00:00:00 +0000 - title: 'Bit Prioritization in Variational Autoencoders via Progressive Coding' abstract: 'The hierarchical variational autoencoder (HVAE) is a popular generative model used for many representation learning tasks. However, its application to image synthesis often yields models with poor sample quality. In this work, we treat image synthesis itself as a hierarchical representation learning problem and regularize an HVAE toward representations that improve the model’s image synthesis performance. We do so by leveraging the progressive coding hypothesis, which claims hierarchical latent variable models that are good at progressive lossy compression will generate high-quality samples. To test this hypothesis, we first show empirically that conventionally-trained HVAEs are not good progressive coders. We then propose a simple method that constrains the hierarchical representations to prioritize the encoding of information beneficial for lossy compression, and show that this modification leads to improved sample quality. Our work lends further support to the progressive coding hypothesis and demonstrates that this hypothesis should be exploited when designing variational autoencoders.' volume: 162 URL: https://proceedings.mlr.press/v162/shu22a.html PDF: https://proceedings.mlr.press/v162/shu22a/shu22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-shu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Rui family: Shu - given: Stefano family: Ermon editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20141-20155 id: shu22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20141 lastpage: 20155 published: 2022-06-28 00:00:00 +0000 - title: 'Fair Representation Learning through Implicit Path Alignment' abstract: 'We consider a fair representation learning perspective, where optimal predictors, on top of the data representation, are ensured to be invariant with respect to different sub-groups. Specifically, we formulate this intuition as a bi-level optimization, where the representation is learned in the outer-loop, and invariant optimal group predictors are updated in the inner-loop. Moreover, the proposed bi-level objective is demonstrated to fulfill the sufficiency rule, which is desirable in various practical scenarios but was not commonly studied in the fair learning. Besides, to avoid the high computational and memory cost of differentiating in the inner-loop of bi-level objective, we propose an implicit path alignment algorithm, which only relies on the solution of inner optimization and the implicit differentiation rather than the exact optimization path. We further analyze the error gap of the implicit approach and empirically validate the proposed method in both classification and regression settings. Experimental results show the consistently better trade-off in prediction performance and fairness measurement.' volume: 162 URL: https://proceedings.mlr.press/v162/shui22a.html PDF: https://proceedings.mlr.press/v162/shui22a/shui22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-shui22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Changjian family: Shui - given: Qi family: Chen - given: Jiaqi family: Li - given: Boyu family: Wang - given: Christian family: Gagné editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20156-20175 id: shui22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20156 lastpage: 20175 published: 2022-06-28 00:00:00 +0000 - title: 'Faster Algorithms for Learning Convex Functions' abstract: 'The task of approximating an arbitrary convex function arises in several learning problems such as convex regression, learning with a difference of convex (DC) functions, and learning Bregman or $f$-divergences. In this paper, we develop and analyze an approach for solving a broad range of convex function learning problems that is faster than state-of-the-art approaches. Our approach is based on a 2-block ADMM method where each block can be computed in closed form. For the task of convex Lipschitz regression, we establish that our proposed algorithm converges with iteration complexity of $ O(n\sqrt{d}/\epsilon)$ for a dataset $\bm X \in \mathbb R^{n\times d}$ and $\epsilon > 0$. Combined with per-iteration computation complexity, our method converges with the rate $O(n^3 d^{1.5}/\epsilon+n^2 d^{2.5}/\epsilon+n d^3/\epsilon)$. This new rate improves the state of the art rate of $O(n^5d^2/\epsilon)$ if $d = o( n^4)$. Further we provide similar solvers for DC regression and Bregman divergence learning. Unlike previous approaches, our method is amenable to the use of GPUs. We demonstrate on regression and metric learning experiments that our approach is over 100 times faster than existing approaches on some data sets, and produces results that are comparable to state of the art.' volume: 162 URL: https://proceedings.mlr.press/v162/siahkamari22a.html PDF: https://proceedings.mlr.press/v162/siahkamari22a/siahkamari22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-siahkamari22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ali family: Siahkamari - given: Durmus Alp Emre family: Acar - given: Christopher family: Liao - given: Kelly L family: Geyer - given: Venkatesh family: Saligrama - given: Brian family: Kulis editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20176-20194 id: siahkamari22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20176 lastpage: 20194 published: 2022-06-28 00:00:00 +0000 - title: 'Coin Flipping Neural Networks' abstract: 'We show that neural networks with access to randomness can outperform deterministic networks by using amplification. We call such networks Coin-Flipping Neural Networks, or CFNNs. We show that a CFNN can approximate the indicator of a d-dimensional ball to arbitrary accuracy with only 2 layers and O(1) neurons, where a 2-layer deterministic network was shown to require Omega(e^d) neurons, an exponential improvement. We prove a highly non-trivial result, that for almost any classification problem, there exists a trivially simple network that solves it given a sufficiently powerful generator for the network’s weights. Combining these results we conjecture that for most classification problems, there is a CFNN which solves them with higher accuracy or fewer neurons than any deterministic network. Finally, we verify our proofs experimentally using novel CFNN architectures on CIFAR10 and CIFAR100, reaching an improvement of 9.25% from the baseline.' volume: 162 URL: https://proceedings.mlr.press/v162/sieradzki22a.html PDF: https://proceedings.mlr.press/v162/sieradzki22a/sieradzki22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-sieradzki22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yuval family: Sieradzki - given: Nitzan family: Hodos - given: Gal family: Yehuda - given: Assaf family: Schuster editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20195-20214 id: sieradzki22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20195 lastpage: 20214 published: 2022-06-28 00:00:00 +0000 - title: 'Reverse Engineering the Neural Tangent Kernel' abstract: 'The development of methods to guide the design of neural networks is an important open challenge for deep learning theory. As a paradigm for principled neural architecture design, we propose the translation of high-performing kernels, which are better-understood and amenable to first-principles design, into equivalent network architectures, which have superior efficiency, flexibility, and feature learning. To this end, we constructively prove that, with just an appropriate choice of activation function, any positive-semidefinite dot-product kernel can be realized as either the NNGP or neural tangent kernel of a fully-connected neural network with only one hidden layer. We verify our construction numerically and demonstrate its utility as a design tool for finite fully-connected networks in several experiments.' volume: 162 URL: https://proceedings.mlr.press/v162/simon22a.html PDF: https://proceedings.mlr.press/v162/simon22a/simon22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-simon22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: James Benjamin family: Simon - given: Sajant family: Anand - given: Mike family: Deweese editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20215-20231 id: simon22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20215 lastpage: 20231 published: 2022-06-28 00:00:00 +0000 - title: 'Demystifying the Adversarial Robustness of Random Transformation Defenses' abstract: 'Neural networks’ lack of robustness against attacks raises concerns in security-sensitive settings such as autonomous vehicles. While many countermeasures may look promising, only a few withstand rigorous evaluation. Defenses using random transformations (RT) have shown impressive results, particularly BaRT (Raff et al., 2019) on ImageNet. However, this type of defense has not been rigorously evaluated, leaving its robustness properties poorly understood. Their stochastic properties make evaluation more challenging and render many proposed attacks on deterministic models inapplicable. First, we show that the BPDA attack (Athalye et al., 2018a) used in BaRT’s evaluation is ineffective and likely overestimates its robustness. We then attempt to construct the strongest possible RT defense through the informed selection of transformations and Bayesian optimization for tuning their parameters. Furthermore, we create the strongest possible attack to evaluate our RT defense. Our new attack vastly outperforms the baseline, reducing the accuracy by 83% compared to the 19% reduction by the commonly used EoT attack ($4.3\times$ improvement). Our result indicates that the RT defense on the Imagenette dataset (a ten-class subset of ImageNet) is not robust against adversarial examples. Extending the study further, we use our new attack to adversarially train RT defense (called AdvRT), resulting in a large robustness gain. Code is available at https://github.com/wagnergroup/demystify-random-transform.' volume: 162 URL: https://proceedings.mlr.press/v162/sitawarin22a.html PDF: https://proceedings.mlr.press/v162/sitawarin22a/sitawarin22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-sitawarin22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Chawin family: Sitawarin - given: Zachary J family: Golan-Strieb - given: David family: Wagner editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20232-20252 id: sitawarin22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20232 lastpage: 20252 published: 2022-06-28 00:00:00 +0000 - title: 'Smoothed Adversarial Linear Contextual Bandits with Knapsacks' abstract: 'Many bandit problems are characterized by the learner making decisions under constraints. The learner in Linear Contextual Bandits with Knapsacks (LinCBwK) receives a resource consumption vector in addition to a scalar reward in each time step which are both linear functions of the context corresponding to the chosen arm. For a fixed time horizon $T$, the goal of the learner is to maximize rewards while ensuring resource consumptions do not exceed a pre-specified budget. We present algorithms and characterize regret for LinCBwK in the smoothed setting where base context vectors are assumed to be perturbed by Gaussian noise. We consider both the stochastic and adversarial settings for the base contexts, and our analysis of stochastic LinCBwK can be viewed as a warm-up to the more challenging adversarial LinCBwK. For the stochastic setting, we obtain $O(\sqrt{T})$ additive regret bounds compared to the best context dependent fixed policy. The analysis combines ideas for greedy parameter estimation in \cite{kmrw18, siwb20} and the primal-dual paradigm first explored in \cite{agde17, agde14}. Our main contribution is an algorithm with $O(\log T)$ competitive ratio relative to the best context dependent fixed policy for the adversarial setting. The algorithm for the adversarial setting employs ideas from the primal-dual framework \cite{agde17, agde14} and a novel adaptation of the doubling trick \cite{isss19}.' volume: 162 URL: https://proceedings.mlr.press/v162/sivakumar22a.html PDF: https://proceedings.mlr.press/v162/sivakumar22a/sivakumar22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-sivakumar22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Vidyashankar family: Sivakumar - given: Shiliang family: Zuo - given: Arindam family: Banerjee editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20253-20277 id: sivakumar22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20253 lastpage: 20277 published: 2022-06-28 00:00:00 +0000 - title: 'GenLabel: Mixup Relabeling using Generative Models' abstract: 'Mixup is a data augmentation method that generates new data points by mixing a pair of input data. While mixup generally improves the prediction performance, it sometimes degrades the performance. In this paper, we first identify the main causes of this phenomenon by theoretically and empirically analyzing the mixup algorithm. To resolve this, we propose GenLabel, a simple yet effective relabeling algorithm designed for mixup. In particular, GenLabel helps the mixup algorithm correctly label mixup samples by learning the class-conditional data distribution using generative models. Via theoretical and empirical analysis, we show that mixup, when used together with GenLabel, can effectively resolve the aforementioned phenomenon, improving the accuracy of mixup-trained model.' volume: 162 URL: https://proceedings.mlr.press/v162/sohn22a.html PDF: https://proceedings.mlr.press/v162/sohn22a/sohn22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-sohn22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jy-Yong family: Sohn - given: Liang family: Shang - given: Hongxu family: Chen - given: Jaekyun family: Moon - given: Dimitris family: Papailiopoulos - given: Kangwook family: Lee editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20278-20313 id: sohn22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20278 lastpage: 20313 published: 2022-06-28 00:00:00 +0000 - title: 'Communicating via Markov Decision Processes' abstract: 'We consider the problem of communicating exogenous information by means of Markov decision process trajectories. This setting, which we call a Markov coding game (MCG), generalizes both source coding and a large class of referential games. MCGs also isolate a problem that is important in decentralized control settings in which cheap-talk is not available—namely, they require balancing communication with the associated cost of communicating. We contribute a theoretically grounded approach to MCGs based on maximum entropy reinforcement learning and minimum entropy coupling that we call MEME. Due to recent breakthroughs in approximation algorithms for minimum entropy coupling, MEME is not merely a theoretical algorithm, but can be applied to practical settings. Empirically, we show both that MEME is able to outperform a strong baseline on small MCGs and that MEME is able to achieve strong performance on extremely large MCGs. To the latter point, we demonstrate that MEME is able to losslessly communicate binary images via trajectories of Cartpole and Pong, while simultaneously achieving the maximal or near maximal expected returns, and that it is even capable of performing well in the presence of actuator noise.' volume: 162 URL: https://proceedings.mlr.press/v162/sokota22a.html PDF: https://proceedings.mlr.press/v162/sokota22a/sokota22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-sokota22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Samuel family: Sokota - given: Christian A Schroeder family: De Witt - given: Maximilian family: Igl - given: Luisa M family: Zintgraf - given: Philip family: Torr - given: Martin family: Strohmeier - given: Zico family: Kolter - given: Shimon family: Whiteson - given: Jakob family: Foerster editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20314-20328 id: sokota22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20314 lastpage: 20328 published: 2022-06-28 00:00:00 +0000 - title: 'The Multivariate Community Hawkes Model for Dependent Relational Events in Continuous-time Networks' abstract: 'The stochastic block model (SBM) is one of the most widely used generative models for network data. Many continuous-time dynamic network models are built upon the same assumption as the SBM: edges or events between all pairs of nodes are conditionally independent given the block or community memberships, which prevents them from reproducing higher-order motifs such as triangles that are commonly observed in real networks. We propose the multivariate community Hawkes (MULCH) model, an extremely flexible community-based model for continuous-time networks that introduces dependence between node pairs using structured multivariate Hawkes processes. We fit the model using a spectral clustering and likelihood-based local refinement procedure. We find that our proposed MULCH model is far more accurate than existing models both for predictive and generative tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/soliman22a.html PDF: https://proceedings.mlr.press/v162/soliman22a/soliman22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-soliman22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hadeel family: Soliman - given: Lingfei family: Zhao - given: Zhipeng family: Huang - given: Subhadeep family: Paul - given: Kevin S family: Xu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20329-20346 id: soliman22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20329 lastpage: 20346 published: 2022-06-28 00:00:00 +0000 - title: 'Disentangling Sources of Risk for Distributional Multi-Agent Reinforcement Learning' abstract: 'In cooperative multi-agent reinforcement learning, the outcomes of agent-wise policies are highly stochastic due to the two sources of risk: (a) random actions taken by teammates and (b) random transition and rewards. Although the two sources have very distinct characteristics, existing frameworks are insufficient to control the risk-sensitivity of agent-wise policies in a disentangled manner. To this end, we propose Disentangled RIsk-sensitive Multi-Agent reinforcement learning (DRIMA) to separately access the risk sources. For example, our framework allows an agent to be optimistic with respect to teammates (who can prosocially adapt) but more risk-neutral with respect to the environment (which does not adapt). Our experiments demonstrate that DRIMA significantly outperforms prior state-of-the-art methods across various scenarios in the StarCraft Multi-agent Challenge environment. Notably, DRIMA shows robust performance where prior methods learn only a highly suboptimal policy, regardless of reward shaping, exploration scheduling, and noisy (random or adversarial) agents.' volume: 162 URL: https://proceedings.mlr.press/v162/son22a.html PDF: https://proceedings.mlr.press/v162/son22a/son22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-son22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Kyunghwan family: Son - given: Junsu family: Kim - given: Sungsoo family: Ahn - given: Roben D Delos family: Reyes - given: Yung family: Yi - given: Jinwoo family: Shin editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20347-20368 id: son22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20347 lastpage: 20368 published: 2022-06-28 00:00:00 +0000 - title: 'TAM: Topology-Aware Margin Loss for Class-Imbalanced Node Classification' abstract: 'Learning unbiased node representations under class-imbalanced graph data is challenging due to interactions between adjacent nodes. Existing studies have in common that they compensate the minor class nodes ‘as a group’ according to their overall quantity (ignoring node connections in graph), which inevitably increase the false positive cases for major nodes. We hypothesize that the increase in these false positive cases is highly affected by the label distribution around each node and confirm it experimentally. In addition, in order to handle this issue, we propose Topology-Aware Margin (TAM) to reflect local topology on the learning objective. Our method compares the connectivity pattern of each node with the class-averaged counter-part and adaptively adjusts the margin accordingly based on that. Our method consistently exhibits superiority over the baselines on various node classification benchmark datasets with representative GNN architectures.' volume: 162 URL: https://proceedings.mlr.press/v162/song22a.html PDF: https://proceedings.mlr.press/v162/song22a/song22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-song22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jaeyun family: Song - given: Joonhyung family: Park - given: Eunho family: Yang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20369-20383 id: song22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20369 lastpage: 20383 published: 2022-06-28 00:00:00 +0000 - title: 'A General Recipe for Likelihood-free Bayesian Optimization' abstract: 'The acquisition function, a critical component in Bayesian optimization (BO), can often be written as the expectation of a utility function under a surrogate model. However, to ensure that acquisition functions are tractable to optimize, restrictions must be placed on the surrogate model and utility function. To extend BO to a broader class of models and utilities, we propose likelihood-free BO (LFBO), an approach based on likelihood-free inference. LFBO directly models the acquisition function without having to separately perform inference with a probabilistic surrogate model. We show that computing the acquisition function in LFBO can be reduced to optimizing a weighted classification problem, which extends an existing likelihood-free density ratio estimation method related to probability of improvement (PI). By choosing the utility function for expected improvement (EI), LFBO outperforms the aforementioned method, as well as various state-of-the-art black-box optimization methods on several real-world optimization problems. LFBO can also leverage composite structures of the objective function, which further improves its regret by several orders of magnitude.' volume: 162 URL: https://proceedings.mlr.press/v162/song22b.html PDF: https://proceedings.mlr.press/v162/song22b/song22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-song22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jiaming family: Song - given: Lantao family: Yu - given: Willie family: Neiswanger - given: Stefano family: Ermon editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20384-20404 id: song22b issued: date-parts: - 2022 - 6 - 28 firstpage: 20384 lastpage: 20404 published: 2022-06-28 00:00:00 +0000 - title: 'Fully-Connected Network on Noncompact Symmetric Space and Ridgelet Transform based on Helgason-Fourier Analysis' abstract: 'Neural network on Riemannian symmetric space such as hyperbolic space and the manifold of symmetric positive definite (SPD) matrices is an emerging subject of research in geometric deep learning. Based on the well-established framework of the Helgason-Fourier transform on the noncompact symmetric space, we present a fully-connected network and its associated ridgelet transform on the noncompact symmetric space, covering the hyperbolic neural network (HNN) and the SPDNet as special cases. The ridgelet transform is an analysis operator of a depth-2 continuous network spanned by neurons, namely, it maps an arbitrary given function to the weights of a network. Thanks to the coordinate-free reformulation, the role of nonlinear activation functions is revealed to be a wavelet function. Moreover, the reconstruction formula is applied to present a constructive proof of the universality of finite networks on symmetric spaces.' volume: 162 URL: https://proceedings.mlr.press/v162/sonoda22a.html PDF: https://proceedings.mlr.press/v162/sonoda22a/sonoda22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-sonoda22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sho family: Sonoda - given: Isao family: Ishikawa - given: Masahiro family: Ikeda editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20405-20422 id: sonoda22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20405 lastpage: 20422 published: 2022-06-28 00:00:00 +0000 - title: 'Saute RL: Almost Surely Safe Reinforcement Learning Using State Augmentation' abstract: 'Satisfying safety constraints almost surely (or with probability one) can be critical for the deployment of Reinforcement Learning (RL) in real-life applications. For example, plane landing and take-off should ideally occur with probability one. We address the problem by introducing Safety Augmented (Saute) Markov Decision Processes (MDPs), where the safety constraints are eliminated by augmenting them into the state-space and reshaping the objective. We show that Saute MDP satisfies the Bellman equation and moves us closer to solving Safe RL with constraints satisfied almost surely. We argue that Saute MDP allows viewing the Safe RL problem from a different perspective enabling new features. For instance, our approach has a plug-and-play nature, i.e., any RL algorithm can be "Sauteed”. Additionally, state augmentation allows for policy generalization across safety constraints. We finally show that Saute RL algorithms can outperform their state-of-the-art counterparts when constraint satisfaction is of high importance.' volume: 162 URL: https://proceedings.mlr.press/v162/sootla22a.html PDF: https://proceedings.mlr.press/v162/sootla22a/sootla22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-sootla22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Aivar family: Sootla - given: Alexander I family: Cowen-Rivers - given: Taher family: Jafferjee - given: Ziyan family: Wang - given: David H family: Mguni - given: Jun family: Wang - given: Haitham family: Ammar editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20423-20443 id: sootla22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20423 lastpage: 20443 published: 2022-06-28 00:00:00 +0000 - title: 'Lightweight Projective Derivative Codes for Compressed Asynchronous Gradient Descent' abstract: 'Coded distributed computation has become common practice for performing gradient descent on large datasets to mitigate stragglers and other faults. This paper proposes a novel algorithm that encodes the partial derivatives themselves and furthermore optimizes the codes by performing lossy compression on the derivative codewords by maximizing the information contained in the codewords while minimizing the information between the codewords. The utility of this application of coding theory is a geometrical consequence of the observed fact in optimization research that noise is tolerable, sometimes even helpful, in gradient descent based learning algorithms since it helps avoid overfitting and local minima. This stands in contrast with much current conventional work on distributed coded computation which focuses on recovering all of the data from the workers. A second further contribution is that the low-weight nature of the coding scheme allows for asynchronous gradient updates since the code can be iteratively decoded; i.e., a worker’s task can immediately be updated into the larger gradient. The directional derivative is always a linear function of the direction vectors; thus, our framework is robust since it can apply linear coding techniques to general machine learning frameworks such as deep neural networks.' volume: 162 URL: https://proceedings.mlr.press/v162/soto22a.html PDF: https://proceedings.mlr.press/v162/soto22a/soto22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-soto22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Pedro J family: Soto - given: Ilia family: Ilmer - given: Haibin family: Guan - given: Jun family: Li editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20444-20458 id: soto22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20444 lastpage: 20458 published: 2022-06-28 00:00:00 +0000 - title: 'Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders' abstract: 'Bayesian optimization (BayesOpt) is a gold standard for query-efficient continuous optimization. However, its adoption for drug design has been hindered by the discrete, high-dimensional nature of the decision variables. We develop a new approach (LaMBO) which jointly trains a denoising autoencoder with a discriminative multi-task Gaussian process head, allowing gradient-based optimization of multi-objective acquisition functions in the latent space of the autoencoder. These acquisition functions allow LaMBO to balance the explore-exploit tradeoff over multiple design rounds, and to balance objective tradeoffs by optimizing sequences at many different points on the Pareto frontier. We evaluate LaMBO on two small-molecule design tasks, and introduce new tasks optimizing in silico and in vitro properties of large-molecule fluorescent proteins. In our experiments LaMBO outperforms genetic optimizers and does not require a large pretraining corpus, demonstrating that BayesOpt is practical and effective for biological sequence design.' volume: 162 URL: https://proceedings.mlr.press/v162/stanton22a.html PDF: https://proceedings.mlr.press/v162/stanton22a/stanton22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-stanton22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Samuel family: Stanton - given: Wesley family: Maddox - given: Nate family: Gruver - given: Phillip family: Maffettone - given: Emily family: Delaney - given: Peyton family: Greenside - given: Andrew Gordon family: Wilson editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20459-20478 id: stanton22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20459 lastpage: 20478 published: 2022-06-28 00:00:00 +0000 - title: '3D Infomax improves GNNs for Molecular Property Prediction' abstract: 'Molecular property prediction is one of the fastest-growing applications of deep learning with critical real-world impacts. Although the 3D molecular graph structure is necessary for models to achieve strong performance on many tasks, it is infeasible to obtain 3D structures at the scale required by many real-world applications. To tackle this issue, we propose to use existing 3D molecular datasets to pre-train a model to reason about the geometry of molecules given only their 2D molecular graphs. Our method, called 3D Infomax, maximizes the mutual information between learned 3D summary vectors and the representations of a graph neural network (GNN). During fine-tuning on molecules with unknown geometry, the GNN is still able to produce implicit 3D information and uses it for downstream tasks. We show that 3D Infomax provides significant improvements for a wide range of properties, including a 22% average MAE reduction on QM9 quantum mechanical properties. Moreover, the learned representations can be effectively transferred between datasets in different molecular spaces.' volume: 162 URL: https://proceedings.mlr.press/v162/stark22a.html PDF: https://proceedings.mlr.press/v162/stark22a/stark22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-stark22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hannes family: Stärk - given: Dominique family: Beaini - given: Gabriele family: Corso - given: Prudencio family: Tossou - given: Christian family: Dallago - given: Stephan family: Günnemann - given: Pietro family: Lió editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20479-20502 id: stark22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20479 lastpage: 20502 published: 2022-06-28 00:00:00 +0000 - title: 'EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction' abstract: 'Predicting how a drug-like molecule binds to a specific protein target is a core problem in drug discovery. An extremely fast computational binding method would enable key applications such as fast virtual screening or drug engineering. Existing methods are computationally expensive as they rely on heavy candidate sampling coupled with scoring, ranking, and fine-tuning steps. We challenge this paradigm with EquiBind, an SE(3)-equivariant geometric deep learning model performing direct-shot prediction of both i) the receptor binding location (blind docking) and ii) the ligand’s bound pose and orientation. EquiBind achieves significant speed-ups and better quality compared to traditional and recent baselines. Further, we show extra improvements when coupling it with existing fine-tuning techniques at the cost of increased running time. Finally, we propose a novel and fast fine-tuning model that adjusts torsion angles of a ligand’s rotatable bonds based on closed form global minima of the von Mises angular distance to a given input atomic point cloud, avoiding previous expensive differential evolution strategies for energy minimization.' volume: 162 URL: https://proceedings.mlr.press/v162/stark22b.html PDF: https://proceedings.mlr.press/v162/stark22b/stark22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-stark22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hannes family: Stärk - given: Octavian family: Ganea - given: Lagnajit family: Pattanaik - given: Dr.Regina family: Barzilay - given: Tommi family: Jaakkola editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20503-20521 id: stark22b issued: date-parts: - 2022 - 6 - 28 firstpage: 20503 lastpage: 20521 published: 2022-06-28 00:00:00 +0000 - title: 'Plug & Play Attacks: Towards Robust and Flexible Model Inversion Attacks' abstract: 'Model inversion attacks (MIAs) aim to create synthetic images that reflect the class-wise characteristics from a target classifier’s private training data by exploiting the model’s learned knowledge. Previous research has developed generative MIAs that use generative adversarial networks (GANs) as image priors tailored to a specific target model. This makes the attacks time- and resource-consuming, inflexible, and susceptible to distributional shifts between datasets. To overcome these drawbacks, we present Plug & Play Attacks, which relax the dependency between the target model and image prior, and enable the use of a single GAN to attack a wide range of targets, requiring only minor adjustments to the attack. Moreover, we show that powerful MIAs are possible even with publicly available pre-trained GANs and under strong distributional shifts, for which previous approaches fail to produce meaningful results. Our extensive evaluation confirms the improved robustness and flexibility of Plug & Play Attacks and their ability to create high-quality images revealing sensitive class characteristics.' volume: 162 URL: https://proceedings.mlr.press/v162/struppek22a.html PDF: https://proceedings.mlr.press/v162/struppek22a/struppek22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-struppek22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Lukas family: Struppek - given: Dominik family: Hintersdorf - given: Antonio family: De Almeida Correira - given: Antonia family: Adler - given: Kristian family: Kersting editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20522-20545 id: struppek22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20522 lastpage: 20545 published: 2022-06-28 00:00:00 +0000 - title: 'Scaling-up Diverse Orthogonal Convolutional Networks by a Paraunitary Framework' abstract: 'Enforcing orthogonality in convolutional neural networks is a remedy for gradient vanishing/exploding problems and sensitivity to perturbation. Many previous approaches for orthogonal convolutions enforce orthogonality on its flattened kernel, which, however, do not lead to the orthogonality of the operation. Some recent approaches consider orthogonality for standard convolutional layers and propose specific classes of their realizations. In this work, we propose a theoretical framework that establishes the equivalence between diverse orthogonal convolutional layers in the spatial domain and the paraunitary systems in the spectral domain. Since 1D paraunitary systems admit a complete factorization, we can parameterize any separable orthogonal convolution as a composition of spatial filters. As a result, our framework endows high expressive power to various convolutional layers while maintaining their exact orthogonality. Furthermore, our layers are memory and computationally efficient for deep networks compared to previous designs. Our versatile framework, for the first time, enables the study of architectural designs for deep orthogonal networks, such as choices of skip connection, initialization, stride, and dilation. Consequently, we scale up orthogonal networks to deep architectures, including ResNet and ShuffleNet, substantially outperforming their shallower counterparts. Finally, we show how to construct residual flows, a flow-based generative model that requires strict Lipschitzness, using our orthogonal networks. Our code will be publicly available at https://github.com/umd-huang-lab/ortho-conv' volume: 162 URL: https://proceedings.mlr.press/v162/su22a.html PDF: https://proceedings.mlr.press/v162/su22a/su22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-su22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jiahao family: Su - given: Wonmin family: Byeon - given: Furong family: Huang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20546-20579 id: su22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20546 lastpage: 20579 published: 2022-06-28 00:00:00 +0000 - title: 'Divergence-Regularized Multi-Agent Actor-Critic' abstract: 'Entropy regularization is a popular method in reinforcement learning (RL). Although it has many advantages, it alters the RL objective and makes the converged policy deviate from the optimal policy of the original Markov Decision Process (MDP). Though divergence regularization has been proposed to settle this problem, it cannot be trivially applied to cooperative multi-agent reinforcement learning (MARL). In this paper, we investigate divergence regularization in cooperative MARL and propose a novel off-policy cooperative MARL framework, divergence-regularized multi-agent actor-critic (DMAC). Theoretically, we derive the update rule of DMAC which is naturally off-policy, guarantees the monotonic policy improvement and convergence in both the original MDP and the divergence-regularized MDP, and is not biased by the regularization. We also give a bound of the discrepancy between the converged policy and the optimal policy in the original MDP. DMAC is a flexible framework and can be combined with many existing MARL algorithms. Empirically, we evaluate DMAC in a didactic stochastic game and StarCraft Multi-Agent Challenge and show that DMAC substantially improves the performance of existing MARL algorithms.' volume: 162 URL: https://proceedings.mlr.press/v162/su22b.html PDF: https://proceedings.mlr.press/v162/su22b/su22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-su22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Kefan family: Su - given: Zongqing family: Lu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20580-20603 id: su22b issued: date-parts: - 2022 - 6 - 28 firstpage: 20580 lastpage: 20603 published: 2022-06-28 00:00:00 +0000 - title: 'Influence-Augmented Local Simulators: a Scalable Solution for Fast Deep RL in Large Networked Systems' abstract: 'Learning effective policies for real-world problems is still an open challenge for the field of reinforcement learning (RL). The main limitation being the amount of data needed and the pace at which that data can be obtained. In this paper, we study how to build lightweight simulators of complicated systems that can run sufficiently fast for deep RL to be applicable. We focus on domains where agents interact with a reduced portion of a larger environment while still being affected by the global dynamics. Our method combines the use of local simulators with learned models that mimic the influence of the global system. The experiments reveal that incorporating this idea into the deep RL workflow can considerably accelerate the training process and presents several opportunities for the future.' volume: 162 URL: https://proceedings.mlr.press/v162/suau22a.html PDF: https://proceedings.mlr.press/v162/suau22a/suau22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-suau22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Miguel family: Suau - given: Jinke family: He - given: Matthijs T. J. family: Spaan - given: Frans family: Oliehoek editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20604-20624 id: suau22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20604 lastpage: 20624 published: 2022-06-28 00:00:00 +0000 - title: 'Improved StyleGAN-v2 based Inversion for Out-of-Distribution Images' abstract: 'Inverting an image onto the latent space of pre-trained generators, e.g., StyleGAN-v2, has emerged as a popular strategy to leverage strong image priors for ill-posed restoration. Several studies have showed that this approach is effective at inverting images similar to the data used for training. However, with out-of-distribution (OOD) data that the generator has not been exposed to, existing inversion techniques produce sub-optimal results. In this paper, we propose SPHInX (StyleGAN with Projection Heads for Inverting X), an approach for accurately embedding OOD images onto the StyleGAN latent space. SPHInX optimizes a style projection head using a novel training strategy that imposes a vicinal regularization in the StyleGAN latent space. To further enhance OOD inversion, SPHInX can additionally optimize a content projection head and noise variables in every layer. Our empirical studies on a suite of OOD data show that, in addition to producing higher quality reconstructions over the state-of-the-art inversion techniques, SPHInX is effective for ill-posed restoration tasks while offering semantic editing capabilities.' volume: 162 URL: https://proceedings.mlr.press/v162/subramanyam22a.html PDF: https://proceedings.mlr.press/v162/subramanyam22a/subramanyam22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-subramanyam22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Rakshith family: Subramanyam - given: Vivek family: Narayanaswamy - given: Mark family: Naufel - given: Andreas family: Spanias - given: Jayaraman J. family: Thiagarajan editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20625-20639 id: subramanyam22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20625 lastpage: 20639 published: 2022-06-28 00:00:00 +0000 - title: 'Continuous-Time Analysis of Accelerated Gradient Methods via Conservation Laws in Dilated Coordinate Systems' abstract: 'We analyze continuous-time models of accelerated gradient methods through deriving conservation laws in dilated coordinate systems. Namely, instead of analyzing the dynamics of $X(t)$, we analyze the dynamics of $W(t)=t^\alpha(X(t)-X_c)$ for some $\alpha$ and $X_c$ and derive a conserved quantity, analogous to physical energy, in this dilated coordinate system. Through this methodology, we recover many known continuous-time analyses in a streamlined manner and obtain novel continuous-time analyses for OGM-G, an acceleration mechanism for efficiently reducing gradient magnitude that is distinct from that of Nesterov. Finally, we show that a semi-second-order symplectic Euler discretization in the dilated coordinate system leads to an $\mathcal{O}(1/k^2)$ rate on the standard setup of smooth convex minimization, without any further assumptions such as infinite differentiability.' volume: 162 URL: https://proceedings.mlr.press/v162/suh22a.html PDF: https://proceedings.mlr.press/v162/suh22a/suh22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-suh22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jaewook J family: Suh - given: Gyumin family: Roh - given: Ernest K family: Ryu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20640-20667 id: suh22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20640 lastpage: 20667 published: 2022-06-28 00:00:00 +0000 - title: 'Do Differentiable Simulators Give Better Policy Gradients?' abstract: 'Differentiable simulators promise faster computation time for reinforcement learning by replacing zeroth-order gradient estimates of a stochastic objective with an estimate based on first-order gradients. However, it is yet unclear what factors decide the performance of the two estimators on complex landscapes that involve long-horizon planning and control on physical systems, despite the crucial relevance of this question for the utility of differentiable simulators. We show that characteristics of certain physical systems, such as stiffness or discontinuities, may compromise the efficacy of the first-order estimator, and analyze this phenomenon through the lens of bias and variance. We additionally propose an $\alpha$-order gradient estimator, with $\alpha \in [0,1]$, which correctly utilizes exact gradients to combine the efficiency of first-order estimates with the robustness of zero-order methods. We demonstrate the pitfalls of traditional estimators and the advantages of the $\alpha$-order estimator on some numerical examples.' volume: 162 URL: https://proceedings.mlr.press/v162/suh22b.html PDF: https://proceedings.mlr.press/v162/suh22b/suh22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-suh22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hyung Ju family: Suh - given: Max family: Simchowitz - given: Kaiqing family: Zhang - given: Russ family: Tedrake editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20668-20696 id: suh22b issued: date-parts: - 2022 - 6 - 28 firstpage: 20668 lastpage: 20696 published: 2022-06-28 00:00:00 +0000 - title: 'Intriguing Properties of Input-Dependent Randomized Smoothing' abstract: 'Randomized smoothing is currently considered the state-of-the-art method to obtain certifiably robust classifiers. Despite its remarkable performance, the method is associated with various serious problems such as “certified accuracy waterfalls”, certification vs. accuracy trade-off, or even fairness issues. Input-dependent smoothing approaches have been proposed with intention of overcoming these flaws. However, we demonstrate that these methods lack formal guarantees and so the resulting certificates are not justified. We show that in general, the input-dependent smoothing suffers from the curse of dimensionality, forcing the variance function to have low semi-elasticity. On the other hand, we provide a theoretical and practical framework that enables the usage of input-dependent smoothing even in the presence of the curse of dimensionality, under strict restrictions. We present one concrete design of the smoothing variance function and test it on CIFAR10 and MNIST. Our design mitigates some of the problems of classical smoothing and is formally underlined, yet further improvement of the design is still necessary.' volume: 162 URL: https://proceedings.mlr.press/v162/sukeni-k22a.html PDF: https://proceedings.mlr.press/v162/sukeni-k22a/sukeni-k22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-sukeni-k22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Peter family: Súkenı́k - given: Aleksei family: Kuvshinov - given: Stephan family: Günnemann editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20697-20743 id: sukeni-k22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20697 lastpage: 20743 published: 2022-06-28 00:00:00 +0000 - title: 'Cliff Diving: Exploring Reward Surfaces in Reinforcement Learning Environments' abstract: 'Visualizing optimization landscapes has resulted in many fundamental insights in numeric optimization, specifically regarding novel improvements to optimization techniques. However, visualizations of the objective that reinforcement learning optimizes (the "reward surface") have only ever been generated for a small number of narrow contexts. This work presents reward surfaces and related visualizations of 27 of the most widely used reinforcement learning environments in Gym for the first time. We also explore reward surfaces in the policy gradient direction and show for the first time that many popular reinforcement learning environments have frequent "cliffs" (sudden large drops in expected reward). We demonstrate that A2C often "dives off" these cliffs into low reward regions of the parameter space while PPO avoids them, confirming a popular intuition for PPO’s improved performance over previous methods. We additionally introduce a highly extensible library that allows researchers to easily generate these visualizations in the future. Our findings provide new intuition to explain the successes and failures of modern RL methods, and our visualizations concretely characterize several failure modes of reinforcement learning agents in novel ways.' volume: 162 URL: https://proceedings.mlr.press/v162/sullivan22a.html PDF: https://proceedings.mlr.press/v162/sullivan22a/sullivan22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-sullivan22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ryan family: Sullivan - given: Jordan K family: Terry - given: Benjamin family: Black - given: John P family: Dickerson editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20744-20776 id: sullivan22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20744 lastpage: 20776 published: 2022-06-28 00:00:00 +0000 - title: 'AGNAS: Attention-Guided Micro and Macro-Architecture Search' abstract: 'Micro- and macro-architecture search have emerged as two popular NAS paradigms recently. Existing methods leverage different search strategies for searching micro- and macro- architectures. When using architecture parameters to search for micro-structure such as normal cell and reduction cell, the architecture parameters can not fully reflect the corresponding operation importance. When searching for the macro-structure chained by pre-defined blocks, many sub-networks need to be sampled for evaluation, which is very time-consuming. To address the two issues, we propose a new search paradigm, that is, leverage the attention mechanism to guide the micro- and macro-architecture search, namely AGNAS. Specifically, we introduce an attention module and plug it behind each candidate operation or each candidate block. We utilize the attention weights to represent the importance of the relevant operations for the micro search or the importance of the relevant blocks for the macro search. Experimental results show that AGNAS can achieve 2.46% test error on CIFAR-10 in the DARTS search space, and 23.4% test error when directly searching on ImageNet in the ProxylessNAS search space. AGNAS also achieves optimal performance on NAS-Bench-201, outperforming state-of-the-art approaches. The source code can be available at https://github.com/Sunzh1996/AGNAS.' volume: 162 URL: https://proceedings.mlr.press/v162/sun22a.html PDF: https://proceedings.mlr.press/v162/sun22a/sun22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-sun22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zihao family: Sun - given: Yu family: Hu - given: Shun family: Lu - given: Longxing family: Yang - given: Jilin family: Mei - given: Yinhe family: Han - given: Xiaowei family: Li editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20777-20789 id: sun22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20777 lastpage: 20789 published: 2022-06-28 00:00:00 +0000 - title: 'Adaptive Random Walk Gradient Descent for Decentralized Optimization' abstract: 'In this paper, we study the adaptive step size random walk gradient descent with momentum for decentralized optimization, in which the training samples are drawn dependently with each other. We establish theoretical convergence rates of the adaptive step size random walk gradient descent with momentum for both convex and nonconvex settings. In particular, we prove that adaptive random walk algorithms perform as well as the non-adaptive method for dependent data in general cases but achieve acceleration when the stochastic gradients are “sparse”. Moreover, we study the zeroth-order version of adaptive random walk gradient descent and provide corresponding convergence results. All assumptions used in this paper are mild and general, making our results applicable to many machine learning problems.' volume: 162 URL: https://proceedings.mlr.press/v162/sun22b.html PDF: https://proceedings.mlr.press/v162/sun22b/sun22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-sun22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tao family: Sun - given: Dongsheng family: Li - given: Bao family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20790-20809 id: sun22b issued: date-parts: - 2022 - 6 - 28 firstpage: 20790 lastpage: 20809 published: 2022-06-28 00:00:00 +0000 - title: 'MAE-DET: Revisiting Maximum Entropy Principle in Zero-Shot NAS for Efficient Object Detection' abstract: 'In object detection, the detection backbone consumes more than half of the overall inference cost. Recent researches attempt to reduce this cost by optimizing the backbone architecture with the help of Neural Architecture Search (NAS). However, existing NAS methods for object detection require hundreds to thousands of GPU hours of searching, making them impractical in fast-paced research and development. In this work, we propose a novel zero-shot NAS method to address this issue. The proposed method, named MAE-DET, automatically designs efficient detection backbones via the Maximum Entropy Principle without training network parameters, reducing the architecture design cost to nearly zero yet delivering the state-of-the-art (SOTA) performance. Under the hood, MAE-DET maximizes the differential entropy of detection backbones, leading to a better feature extractor for object detection under the same computational budgets. After merely one GPU day of fully automatic design, MAE-DET innovates SOTA detection backbones on multiple detection benchmark datasets with little human intervention. Comparing to ResNet-50 backbone, MAE-DET is $+2.0%$ better in mAP when using the same amount of FLOPs/parameters, and is $1.54$ times faster on NVIDIA V100 at the same mAP. Code and pre-trained models are available here (https://github.com/alibaba/lightweight-neural-architecture-search).' volume: 162 URL: https://proceedings.mlr.press/v162/sun22c.html PDF: https://proceedings.mlr.press/v162/sun22c/sun22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-sun22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhenhong family: Sun - given: Ming family: Lin - given: Xiuyu family: Sun - given: Zhiyu family: Tan - given: Hao family: Li - given: Rong family: Jin editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20810-20826 id: sun22c issued: date-parts: - 2022 - 6 - 28 firstpage: 20810 lastpage: 20826 published: 2022-06-28 00:00:00 +0000 - title: 'Out-of-Distribution Detection with Deep Nearest Neighbors' abstract: 'Out-of-distribution (OOD) detection is a critical task for deploying machine learning models in the open world. Distance-based methods have demonstrated promise, where testing samples are detected as OOD if they are relatively far away from in-distribution (ID) data. However, prior methods impose a strong distributional assumption of the underlying feature space, which may not always hold. In this paper, we explore the efficacy of non-parametric nearest-neighbor distance for OOD detection, which has been largely overlooked in the literature. Unlike prior works, our method does not impose any distributional assumption, hence providing stronger flexibility and generality. We demonstrate the effectiveness of nearest-neighbor-based OOD detection on several benchmarks and establish superior performance. Under the same model trained on ImageNet-1k, our method substantially reduces the false positive rate (FPR@TPR95) by 24.77% compared to a strong baseline SSD+, which uses a parametric approach Mahalanobis distance in detection. Code is available: https://github.com/deeplearning-wisc/knn-ood.' volume: 162 URL: https://proceedings.mlr.press/v162/sun22d.html PDF: https://proceedings.mlr.press/v162/sun22d/sun22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-sun22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yiyou family: Sun - given: Yifei family: Ming - given: Xiaojin family: Zhu - given: Yixuan family: Li editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20827-20840 id: sun22d issued: date-parts: - 2022 - 6 - 28 firstpage: 20827 lastpage: 20840 published: 2022-06-28 00:00:00 +0000 - title: 'Black-Box Tuning for Language-Model-as-a-Service' abstract: 'Extremely large pre-trained language models (PTMs) such as GPT-3 are usually released as a service. It allows users to design task-specific prompts to query the PTMs through some black-box APIs. In such a scenario, which we call Language-Model-as-a-Service (LMaaS), the gradients of PTMs are usually unavailable. Can we optimize the task prompts by only accessing the model inference APIs? This paper proposes the black-box tuning framework to optimize the continuous prompt prepended to the input text via derivative-free optimization. Instead of optimizing in the original high-dimensional prompt space, which is intractable for traditional derivative-free optimization, we perform optimization in a randomly generated subspace due to the low intrinsic dimensionality of large PTMs. The experimental results show that the black-box tuning with RoBERTa on a few labeled samples not only significantly outperforms manual prompt and GPT-3’s in-context learning, but also surpasses the gradient-based counterparts, i.e., prompt tuning and full model tuning.' volume: 162 URL: https://proceedings.mlr.press/v162/sun22e.html PDF: https://proceedings.mlr.press/v162/sun22e/sun22e.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-sun22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tianxiang family: Sun - given: Yunfan family: Shao - given: Hong family: Qian - given: Xuanjing family: Huang - given: Xipeng family: Qiu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20841-20855 id: sun22e issued: date-parts: - 2022 - 6 - 28 firstpage: 20841 lastpage: 20855 published: 2022-06-28 00:00:00 +0000 - title: 'Correlated Quantization for Distributed Mean Estimation and Optimization' abstract: 'We study the problem of distributed mean estimation and optimization under communication constraints. We propose a correlated quantization protocol whose error guarantee depends on the deviation of data points instead of their absolute range. The design doesn’t need any prior knowledge on the concentration property of the dataset, which is required to get such dependence in previous works. We show that applying the proposed protocol as a sub-routine in distributed optimization algorithms leads to better convergence rates. We also prove the optimality of our protocol under mild assumptions. Experimental results show that our proposed algorithm outperforms existing mean estimation protocols on a diverse set of tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/suresh22a.html PDF: https://proceedings.mlr.press/v162/suresh22a/suresh22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-suresh22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ananda Theertha family: Suresh - given: Ziteng family: Sun - given: Jae family: Ro - given: Felix family: Yu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20856-20876 id: suresh22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20856 lastpage: 20876 published: 2022-06-28 00:00:00 +0000 - title: 'Causal Imitation Learning under Temporally Correlated Noise' abstract: 'We develop algorithms for imitation learning from policy data that was corrupted by temporally correlated noise in expert actions. When noise affects multiple timesteps of recorded data, it can manifest as spurious correlations between states and actions that a learner might latch on to, leading to poor policy performance. To break up these spurious correlations, we apply modern variants of the instrumental variable regression (IVR) technique of econometrics, enabling us to recover the underlying policy without requiring access to an interactive expert. In particular, we present two techniques, one of a generative-modeling flavor (DoubIL) that can utilize access to a simulator, and one of a game-theoretic flavor (ResiduIL) that can be run entirely offline. We find both of our algorithms compare favorably to behavioral cloning on simulated control tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/swamy22a.html PDF: https://proceedings.mlr.press/v162/swamy22a/swamy22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-swamy22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Gokul family: Swamy - given: Sanjiban family: Choudhury - given: Drew family: Bagnell - given: Steven family: Wu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20877-20890 id: swamy22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20877 lastpage: 20890 published: 2022-06-28 00:00:00 +0000 - title: 'Being Properly Improper' abstract: 'Properness for supervised losses stipulates that the loss function shapes the learning algorithm towards the true posterior of the data generating distribution. Unfortunately, data in modern machine learning can be corrupted or twisted in many ways. Hence, optimizing a proper loss function on twisted data could perilously lead the learning algorithm towards the twisted posterior, rather than to the desired clean posterior. Many papers cope with specific twists (e.g., label/feature/adversarial noise), but there is a growing need for a unified and actionable understanding atop properness. Our chief theoretical contribution is a generalization of the properness framework with a notion called twist-properness, which delineates loss functions with the ability to "untwist" the twisted posterior into the clean posterior. Notably, we show that a nontrivial extension of a loss function called alpha-loss, which was first introduced in information theory, is twist-proper. We study the twist-proper alpha-loss under a novel boosting algorithm, called PILBoost, and provide formal and experimental results for this algorithm. Our overarching practical conclusion is that the twist-proper alpha-loss outperforms the proper log-loss on several variants of twisted data.' volume: 162 URL: https://proceedings.mlr.press/v162/sypherd22a.html PDF: https://proceedings.mlr.press/v162/sypherd22a/sypherd22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-sypherd22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tyler family: Sypherd - given: Richard family: Nock - given: Lalitha family: Sankar editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20891-20932 id: sypherd22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20891 lastpage: 20932 published: 2022-06-28 00:00:00 +0000 - title: 'Distributionally-Aware Kernelized Bandit Problems for Risk Aversion' abstract: 'The kernelized bandit problem is a theoretically justified framework and has solid applications to various fields. Recently, there is a growing interest in generalizing the problem to the optimization of risk-averse metrics such as Conditional Value-at-Risk (CVaR) or Mean-Variance (MV). However, due to the model assumption, most existing methods need explicit design of environment random variables and can incur large regret because of possible high dimensionality of them. To address the issues, in this paper, we model environments using a family of the output distributions (or more precisely, probability kernel) and Kernel Mean Embeddings (KME), and provide novel UCB-type algorithms for CVaR and MV. Moreover, we provide algorithm-independent lower bounds for CVaR in the case of Matérn kernels, and propose a nearly optimal algorithm. Furthermore, we empirically verify our theoretical result in synthetic environments, and demonstrate that our proposed method significantly outperforms a baseline in many cases.' volume: 162 URL: https://proceedings.mlr.press/v162/takemori22a.html PDF: https://proceedings.mlr.press/v162/takemori22a/takemori22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-takemori22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sho family: Takemori editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20933-20959 id: takemori22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20933 lastpage: 20959 published: 2022-06-28 00:00:00 +0000 - title: 'Sequential and Parallel Constrained Max-value Entropy Search via Information Lower Bound' abstract: 'Max-value entropy search (MES) is one of the state-of-the-art approaches in Bayesian optimization (BO). In this paper, we propose a novel variant of MES for constrained problems, called Constrained MES via Information lower BOund (CMES-IBO), that is based on a Monte Carlo (MC) estimator of a lower bound of a mutual information (MI). Unlike existing studies, our MI is defined so that uncertainty with respect to feasibility can be incorporated. We derive a lower bound of the MI that guarantees non-negativity, while a constrained counterpart of conventional MES can be negative. We further provide theoretical analysis that assures the low-variability of our estimator which has never been investigated for any existing information-theoretic BO. Moreover, using the conditional MI, we extend CMES-IBO to the parallel setting while maintaining the desirable properties. We demonstrate the effectiveness of CMES-IBO by several benchmark functions and real-world problems.' volume: 162 URL: https://proceedings.mlr.press/v162/takeno22a.html PDF: https://proceedings.mlr.press/v162/takeno22a/takeno22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-takeno22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Shion family: Takeno - given: Tomoyuki family: Tamura - given: Kazuki family: Shitara - given: Masayuki family: Karasuyama editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20960-20986 id: takeno22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20960 lastpage: 20986 published: 2022-06-28 00:00:00 +0000 - title: 'SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization' abstract: 'One noted issue of vector-quantized variational autoencoder (VQ-VAE) is that the learned discrete representation uses only a fraction of the full capacity of the codebook, also known as codebook collapse. We hypothesize that the training scheme of VQ-VAE, which involves some carefully designed heuristics, underlies this issue. In this paper, we propose a new training scheme that extends the standard VAE via novel stochastic dequantization and quantization, called stochastically quantized variational autoencoder (SQ-VAE). In SQ-VAE, we observe a trend that the quantization is stochastic at the initial stage of the training but gradually converges toward a deterministic quantization, which we call self-annealing. Our experiments show that SQ-VAE improves codebook utilization without using common heuristics. Furthermore, we empirically show that SQ-VAE is superior to VAE and VQ-VAE in vision- and speech-related tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/takida22a.html PDF: https://proceedings.mlr.press/v162/takida22a/takida22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-takida22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yuhta family: Takida - given: Takashi family: Shibuya - given: Weihsiang family: Liao - given: Chieh-Hsin family: Lai - given: Junki family: Ohmura - given: Toshimitsu family: Uesaka - given: Naoki family: Murata - given: Shusuke family: Takahashi - given: Toshiyuki family: Kumakura - given: Yuki family: Mitsufuji editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 20987-21012 id: takida22a issued: date-parts: - 2022 - 6 - 28 firstpage: 20987 lastpage: 21012 published: 2022-06-28 00:00:00 +0000 - title: 'A Tree-based Model Averaging Approach for Personalized Treatment Effect Estimation from Heterogeneous Data Sources' abstract: 'Accurately estimating personalized treatment effects within a study site (e.g., a hospital) has been challenging due to limited sample size. Furthermore, privacy considerations and lack of resources prevent a site from leveraging subject-level data from other sites. We propose a tree-based model averaging approach to improve the estimation accuracy of conditional average treatment effects (CATE) at a target site by leveraging models derived from other potentially heterogeneous sites, without them sharing subject-level data. To our best knowledge, there is no established model averaging approach for distributed data with a focus on improving the estimation of treatment effects. Specifically, under distributed data networks, our framework provides an interpretable tree-based ensemble of CATE estimators that joins models across study sites, while actively modeling the heterogeneity in data sources through site partitioning. The performance of this approach is demonstrated by a real-world study of the causal effects of oxygen therapy on hospital survival rate and backed up by comprehensive simulation results.' volume: 162 URL: https://proceedings.mlr.press/v162/tan22a.html PDF: https://proceedings.mlr.press/v162/tan22a/tan22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-tan22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xiaoqing family: Tan - given: Chung-Chou H. family: Chang - given: Ling family: Zhou - given: Lu family: Tang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21013-21036 id: tan22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21013 lastpage: 21036 published: 2022-06-28 00:00:00 +0000 - title: 'N-Penetrate: Active Learning of Neural Collision Handler for Complex 3D Mesh Deformations' abstract: 'We present a robust learning algorithm to detect and handle collisions in 3D deforming meshes. We first train a neural network to detect collisions and then use a numerical optimization algorithm to resolve penetrations guided by the network. Our learned collision handler can resolve collisions for unseen, high-dimensional meshes with thousands of vertices. To obtain stable network performance in such large and unseen spaces, we apply active learning by progressively inserting new collision data based on the network inferences. We automatically label these new data using an analytical collision detector and progressively fine-tune our detection networks. We evaluate our method for collision handling of complex, 3D meshes coming from several datasets with different shapes and topologies, including datasets corresponding to dressed and undressed human poses, cloth simulations, and human hand poses acquired using multi-view capture systems.' volume: 162 URL: https://proceedings.mlr.press/v162/tan22b.html PDF: https://proceedings.mlr.press/v162/tan22b/tan22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-tan22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Qingyang family: Tan - given: Zherong family: Pan - given: Breannan family: Smith - given: Takaaki family: Shiratori - given: Dinesh family: Manocha editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21037-21049 id: tan22b issued: date-parts: - 2022 - 6 - 28 firstpage: 21037 lastpage: 21049 published: 2022-06-28 00:00:00 +0000 - title: 'Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning' abstract: 'Despite the empirical success of meta reinforcement learning (meta-RL), there are still a number poorly-understood discrepancies between theory and practice. Critically, biased gradient estimates are almost always implemented in practice, whereas prior theory on meta-RL only establishes convergence under unbiased gradient estimates. In this work, we investigate such a discrepancy. In particular, (1) We show that unbiased gradient estimates have variance $\Theta(N)$ which linearly depends on the sample size $N$ of the inner loop updates; (2) We propose linearized score function (LSF) gradient estimates, which have bias $\mathcal{O}(1/\sqrt{N})$ and variance $\mathcal{O}(1/N)$; (3) We show that most empirical prior work in fact implements variants of the LSF gradient estimates. This implies that practical algorithms "accidentally" introduce bias to achieve better performance; (4) We establish theoretical guarantees for the LSF gradient estimates in meta-RL regarding its convergence to stationary points, showing better dependency on $N$ than prior work when $N$ is large.' volume: 162 URL: https://proceedings.mlr.press/v162/tang22a.html PDF: https://proceedings.mlr.press/v162/tang22a/tang22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-tang22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yunhao family: Tang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21050-21075 id: tang22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21050 lastpage: 21075 published: 2022-06-28 00:00:00 +0000 - title: 'Rethinking Graph Neural Networks for Anomaly Detection' abstract: 'Graph Neural Networks (GNNs) are widely applied for graph anomaly detection. As one of the key components for GNN design is to select a tailored spectral filter, we take the first step towards analyzing anomalies via the lens of the graph spectrum. Our crucial observation is the existence of anomalies will lead to the ‘right-shift’ phenomenon, that is, the spectral energy distribution concentrates less on low frequencies and more on high frequencies. This fact motivates us to propose the Beta Wavelet Graph Neural Network (BWGNN). Indeed, BWGNN has spectral and spatial localized band-pass filters to better handle the ‘right-shift’ phenomenon in anomalies. We demonstrate the effectiveness of BWGNN on four large-scale anomaly detection datasets. Our code and data are released at https://github.com/squareRoot3/Rethinking-Anomaly-Detection.' volume: 162 URL: https://proceedings.mlr.press/v162/tang22b.html PDF: https://proceedings.mlr.press/v162/tang22b/tang22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-tang22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jianheng family: Tang - given: Jiajin family: Li - given: Ziqi family: Gao - given: Jia family: Li editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21076-21089 id: tang22b issued: date-parts: - 2022 - 6 - 28 firstpage: 21076 lastpage: 21089 published: 2022-06-28 00:00:00 +0000 - title: 'Deep Safe Incomplete Multi-view Clustering: Theorem and Algorithm' abstract: 'Incomplete multi-view clustering is a significant but challenging task. Although jointly imputing incomplete samples and conducting clustering has been shown to achieve promising performance, learning from both complete and incomplete data may be worse than learning only from complete data, particularly when imputed views are semantic inconsistent with missing views. To address this issue, we propose a novel framework to reduce the clustering performance degradation risk from semantic inconsistent imputed views. Concretely, by the proposed bi-level optimization framework, missing views are dynamically imputed from the learned semantic neighbors, and imputed samples are automatically selected for training. In theory, the empirical risk of the model is no higher than learning only from complete data, and the model is never worse than learning only from complete data in terms of expected risk with high probability. Comprehensive experiments demonstrate that the proposed method achieves superior performance and efficient safe incomplete multi-view clustering.' volume: 162 URL: https://proceedings.mlr.press/v162/tang22c.html PDF: https://proceedings.mlr.press/v162/tang22c/tang22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-tang22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Huayi family: Tang - given: Yong family: Liu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21090-21110 id: tang22c issued: date-parts: - 2022 - 6 - 28 firstpage: 21090 lastpage: 21110 published: 2022-06-28 00:00:00 +0000 - title: 'Virtual Homogeneity Learning: Defending against Data Heterogeneity in Federated Learning' abstract: 'In federated learning (FL), model performance typically suffers from client drift induced by data heterogeneity, and mainstream works focus on correcting client drift. We propose a different approach named virtual homogeneity learning (VHL) to directly “rectify” the data heterogeneity. In particular, VHL conducts FL with a virtual homogeneous dataset crafted to satisfy two conditions: containing no private information and being separable. The virtual dataset can be generated from pure noise shared across clients, aiming to calibrate the features from the heterogeneous clients. Theoretically, we prove that VHL can achieve provable generalization performance on the natural distribution. Empirically, we demonstrate that VHL endows FL with drastically improved convergence speed and generalization performance. VHL is the first attempt towards using a virtual dataset to address data heterogeneity, offering new and effective means to FL.' volume: 162 URL: https://proceedings.mlr.press/v162/tang22d.html PDF: https://proceedings.mlr.press/v162/tang22d/tang22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-tang22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhenheng family: Tang - given: Yonggang family: Zhang - given: Shaohuai family: Shi - given: Xin family: He - given: Bo family: Han - given: Xiaowen family: Chu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21111-21132 id: tang22d issued: date-parts: - 2022 - 6 - 28 firstpage: 21111 lastpage: 21132 published: 2022-06-28 00:00:00 +0000 - title: 'Cross-Space Active Learning on Graph Convolutional Networks' abstract: 'This paper formalizes cross-space active learning on a graph convolutional network (GCN). The objective is to attain the most accurate hypothesis available in any of the instance spaces generated by the GCN. Subject to the objective, the challenge is to minimize the label cost, measured in the number of vertices whose labels are requested. Our study covers both budget algorithms which terminate after a designated number of label requests, and verifiable algorithms which terminate only after having found an accurate hypothesis. A new separation in label complexity between the two algorithm types is established. The separation is unique to GCNs.' volume: 162 URL: https://proceedings.mlr.press/v162/tao22a.html PDF: https://proceedings.mlr.press/v162/tao22a/tao22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-tao22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yufei family: Tao - given: Hao family: Wu - given: Shiyuan family: Deng editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21133-21145 id: tao22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21133 lastpage: 21145 published: 2022-06-28 00:00:00 +0000 - title: 'FedNest: Federated Bilevel, Minimax, and Compositional Optimization' abstract: 'Standard federated optimization methods successfully apply to stochastic problems with single-level structure. However, many contemporary ML problems - including adversarial robustness, hyperparameter tuning, actor-critic - fall under nested bilevel programming that subsumes minimax and compositional optimization. In this work, we propose FedNest: A federated alternating stochastic gradient method to address general nested problems. We establish provable convergence rates for FedNest in the presence of heterogeneous data and introduce variations for bilevel, minimax, and compositional optimization. FedNest introduces multiple innovations including federated hypergradient computation and variance reduction to address inner-level heterogeneity. We complement our theory with experiments on hyperparameter & hyper-representation learning and minimax optimization that demonstrate the benefits of our method in practice.' volume: 162 URL: https://proceedings.mlr.press/v162/tarzanagh22a.html PDF: https://proceedings.mlr.press/v162/tarzanagh22a/tarzanagh22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-tarzanagh22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Davoud Ataee family: Tarzanagh - given: Mingchen family: Li - given: Christos family: Thrampoulidis - given: Samet family: Oymak editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21146-21179 id: tarzanagh22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21146 lastpage: 21179 published: 2022-06-28 00:00:00 +0000 - title: 'Efficient Distributionally Robust Bayesian Optimization with Worst-case Sensitivity' abstract: 'In distributionally robust Bayesian optimization (DRBO), an exact computation of the worst-case expected value requires solving an expensive convex optimization problem. We develop a fast approximation of the worst-case expected value based on the notion of worst-case sensitivity that caters to arbitrary convex distribution distances. We provide a regret bound for our novel DRBO algorithm with the fast approximation, and empirically show it is competitive with that using the exact worst-case expected value while incurring significantly less computation time. In order to guide the choice of distribution distance to be used with DRBO, we show that our approximation implicitly optimizes an objective close to an interpretable risk-sensitive value.' volume: 162 URL: https://proceedings.mlr.press/v162/tay22a.html PDF: https://proceedings.mlr.press/v162/tay22a/tay22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-tay22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sebastian Shenghong family: Tay - given: Chuan Sheng family: Foo - given: Urano family: Daisuke - given: Richalynn family: Leong - given: Bryan Kian Hsiang family: Low editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21180-21204 id: tay22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21180 lastpage: 21204 published: 2022-06-28 00:00:00 +0000 - title: 'LIDL: Local Intrinsic Dimension Estimation Using Approximate Likelihood' abstract: 'Most of the existing methods for estimating the local intrinsic dimension of a data distribution do not scale well to high dimensional data. Many of them rely on a non-parametric nearest neighbours approach which suffers from the curse of dimensionality. We attempt to address that challenge by proposing a novel approach to the problem: Local Intrinsic Dimension estimation using approximate Likelihood (LIDL). Our method relies on an arbitrary density estimation method as its subroutine, and hence tries to sidestep the dimensionality challenge by making use of the recent progress in parametric neural methods for likelihood estimation. We carefully investigate the empirical properties of the proposed method, compare them with our theoretical predictions, show that LIDL yields competitive results on the standard benchmarks for this problem, and that it scales to thousands of dimensions. What is more, we anticipate this approach to improve further with the continuing advances in the density estimation literature.' volume: 162 URL: https://proceedings.mlr.press/v162/tempczyk22a.html PDF: https://proceedings.mlr.press/v162/tempczyk22a/tempczyk22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-tempczyk22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Piotr family: Tempczyk - given: Rafał family: Michaluk - given: Lukasz family: Garncarek - given: Przemysław family: Spurek - given: Jacek family: Tabor - given: Adam family: Golinski editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21205-21231 id: tempczyk22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21205 lastpage: 21231 published: 2022-06-28 00:00:00 +0000 - title: 'LCANets: Lateral Competition Improves Robustness Against Corruption and Attack' abstract: 'Although Convolutional Neural Networks (CNNs) achieve high accuracy on image recognition tasks, they lack robustness against realistic corruptions and fail catastrophically when deliberately attacked. Previous CNNs with representations similar to primary visual cortex (V1) were more robust to adversarial attacks on images than current adversarial defense techniques, but they required training on large-scale neural recordings or handcrafting neuroscientific models. Motivated by evidence that neural activity in V1 is sparse, we develop a class of hybrid CNNs, called LCANets, which feature a frontend that performs sparse coding via local lateral competition. We demonstrate that LCANets achieve competitive clean accuracy to standard CNNs on action and image recognition tasks and significantly greater accuracy under various image corruptions. We also perform the first adversarial attacks with full knowledge of a sparse coding CNN layer by attacking LCANets with white-box and black-box attacks, and we show that, contrary to previous hypotheses, sparse coding layers are not very robust to white-box attacks. Finally, we propose a way to use sparse coding layers as a plug-and-play robust frontend by showing that they significantly increase the robustness of adversarially-trained CNNs over corruptions and attacks.' volume: 162 URL: https://proceedings.mlr.press/v162/teti22a.html PDF: https://proceedings.mlr.press/v162/teti22a/teti22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-teti22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Michael family: Teti - given: Garrett family: Kenyon - given: Ben family: Migliori - given: Juston family: Moore editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21232-21252 id: teti22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21232 lastpage: 21252 published: 2022-06-28 00:00:00 +0000 - title: 'Reverse Engineering $\ell_p$ attacks: A block-sparse optimization approach with recovery guarantees' abstract: 'Deep neural network-based classifiers have been shown to be vulnerable to imperceptible perturbations to their input, such as $\ell_p$-bounded norm adversarial attacks. This has motivated the development of many defense methods, which are then broken by new attacks, and so on. This paper focuses on a different but related problem of reverse engineering adversarial attacks. Specifically, given an attacked signal, we study conditions under which one can determine the type of attack ($\ell_1$, $\ell_2$ or $\ell_\infty$) and recover the clean signal. We pose this problem as a block-sparse recovery problem, where both the signal and the attack are assumed to lie in a union of subspaces that includes one subspace per class and one subspace per attack type. We derive geometric conditions on the subspaces under which any attacked signal can be decomposed as the sum of a clean signal plus an attack. In addition, by determining the subspaces that contain the signal and the attack, we can also classify the signal and determine the attack type. Experiments on digit and face classification demonstrate the effectiveness of the proposed approach.' volume: 162 URL: https://proceedings.mlr.press/v162/thaker22a.html PDF: https://proceedings.mlr.press/v162/thaker22a/thaker22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-thaker22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Darshan family: Thaker - given: Paris family: Giampouras - given: Rene family: Vidal editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21253-21271 id: thaker22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21253 lastpage: 21271 published: 2022-06-28 00:00:00 +0000 - title: 'Generalised Policy Improvement with Geometric Policy Composition' abstract: 'We introduce a method for policy improvement that interpolates between the greedy approach of value-based reinforcement learning (RL) and the full planning approach typical of model-based RL. The new method builds on the concept of a geometric horizon model (GHM, also known as a \gamma-model), which models the discounted state-visitation distribution of a given policy. We show that we can evaluate any non-Markov policy that switches between a set of base Markov policies with fixed probability by a careful composition of the base policy GHMs, without any additional learning. We can then apply generalised policy improvement (GPI) to collections of such non-Markov policies to obtain a new Markov policy that will in general outperform its precursors. We provide a thorough theoretical analysis of this approach, develop applications to transfer and standard RL, and empirically demonstrate its effectiveness over standard GPI on a challenging deep RL continuous control task. We also provide an analysis of GHM training methods, proving a novel convergence result regarding previously proposed methods and showing how to train these models stably in deep RL settings.' volume: 162 URL: https://proceedings.mlr.press/v162/thakoor22a.html PDF: https://proceedings.mlr.press/v162/thakoor22a/thakoor22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-thakoor22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Shantanu family: Thakoor - given: Mark family: Rowland - given: Diana family: Borsa - given: Will family: Dabney - given: Remi family: Munos - given: Andre family: Barreto editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21272-21307 id: thakoor22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21272 lastpage: 21307 published: 2022-06-28 00:00:00 +0000 - title: 'Algorithms for the Communication of Samples' abstract: 'The efficient communication of noisy data has applications in several areas of machine learning, such as neural compression or differential privacy, and is also known as reverse channel coding or the channel simulation problem. Here we propose two new coding schemes with practical advantages over existing approaches. First, we introduce ordered random coding (ORC) which uses a simple trick to reduce the coding cost of previous approaches. This scheme further illuminates a connection between schemes based on importance sampling and the so-called Poisson functional representation. Second, we describe a hybrid coding scheme which uses dithered quantization to more efficiently communicate samples from distributions with bounded support.' volume: 162 URL: https://proceedings.mlr.press/v162/theis22a.html PDF: https://proceedings.mlr.press/v162/theis22a/theis22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-theis22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Lucas family: Theis - given: Noureldin Y family: Ahmed editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21308-21328 id: theis22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21308 lastpage: 21328 published: 2022-06-28 00:00:00 +0000 - title: 'Consistent Polyhedral Surrogates for Top-k Classification and Variants' abstract: 'Top-$k$ classification is a generalization of multiclass classification used widely in information retrieval, image classification, and other extreme classification settings. Several hinge-like (piecewise-linear) surrogates have been proposed for the problem, yet all are either non-convex or inconsistent. For the proposed hinge-like surrogates that are convex (i.e., polyhedral), we apply the recent embedding framework of Finocchiaro et al. (2019; 2022) to determine the prediction problem for which the surrogate is consistent. These problems can all be interpreted as variants of top-$k$ classification, which may be better aligned with some applications. We leverage this analysis to derive constraints on the conditional label distributions under which these proposed surrogates become consistent for top-$k$. It has been further suggested that every convex hinge-like surrogate must be inconsistent for top-$k$. Yet, we use the same embedding framework to give the first consistent polyhedral surrogate for this problem.' volume: 162 URL: https://proceedings.mlr.press/v162/thilagar22a.html PDF: https://proceedings.mlr.press/v162/thilagar22a/thilagar22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-thilagar22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Anish family: Thilagar - given: Rafael family: Frongillo - given: Jessica J family: Finocchiaro - given: Emma family: Goodwill editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21329-21359 id: thilagar22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21329 lastpage: 21359 published: 2022-06-28 00:00:00 +0000 - title: 'On the Finite-Time Complexity and Practical Computation of Approximate Stationarity Concepts of Lipschitz Functions' abstract: 'We report a practical finite-time algorithmic scheme to compute approximately stationary points for nonconvex nonsmooth Lipschitz functions. In particular, we are interested in two kinds of approximate stationarity notions for nonconvex nonsmooth problems, i.e., Goldstein approximate stationarity (GAS) and near-approximate stationarity (NAS). For GAS, our scheme removes the unrealistic subgradient selection oracle assumption in (Zhang et al., 2020, Assumption 1) and computes GAS with the same finite-time complexity. For NAS, Davis & Drusvyatskiy (2019) showed that $\rho$-weakly convex functions admit finite-time computation, while Tian & So (2021) provided the matching impossibility results of dimension-free finite-time complexity for first-order methods. Complement to these developments, in this paper, we isolate a new class of functions that could be Clarke irregular (and thus not weakly convex anymore) and show that our new algorithmic scheme can compute NAS points for functions in that class within finite time. To demonstrate the wide applicability of our new theoretical framework, we show that $\rho$-margin SVM, $1$-layer, and $2$-layer ReLU neural networks, all being Clarke irregular, satisfy our new conditions.' volume: 162 URL: https://proceedings.mlr.press/v162/tian22a.html PDF: https://proceedings.mlr.press/v162/tian22a/tian22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-tian22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Lai family: Tian - given: Kaiwen family: Zhou - given: Anthony Man-Cho family: So editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21360-21379 id: tian22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21360 lastpage: 21379 published: 2022-06-28 00:00:00 +0000 - title: 'From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses' abstract: 'We propose the Bayes-UCBVI algorithm for reinforcement learning in tabular, stage-dependent, episodic Markov decision process: a natural extension of the Bayes-UCB algorithm by Kaufmann et al. 2012 for multi-armed bandits. Our method uses the quantile of a Q-value function posterior as upper confidence bound on the optimal Q-value function. For Bayes-UCBVI, we prove a regret bound of order $\widetilde{\mathcal{O}}(\sqrt{H^3SAT})$ where $H$ is the length of one episode, $S$ is the number of states, $A$ the number of actions, $T$ the number of episodes, that matches the lower-bound of $\Omega(\sqrt{H^3SAT})$ up to poly-$\log$ terms in $H,S,A,T$ for a large enough $T$. To the best of our knowledge, this is the first algorithm that obtains an optimal dependence on the horizon $H$ (and $S$) without the need of an involved Bernstein-like bonus or noise. Crucial to our analysis is a new fine-grained anti-concentration bound for a weighted Dirichlet sum that can be of independent interest. We then explain how Bayes-UCBVI can be easily extended beyond the tabular setting, exhibiting a strong link between our algorithm and Bayesian bootstrap (Rubin,1981).' volume: 162 URL: https://proceedings.mlr.press/v162/tiapkin22a.html PDF: https://proceedings.mlr.press/v162/tiapkin22a/tiapkin22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-tiapkin22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Daniil family: Tiapkin - given: Denis family: Belomestny - given: Eric family: Moulines - given: Alexey family: Naumov - given: Sergey family: Samsonov - given: Yunhao family: Tang - given: Michal family: Valko - given: Pierre family: Menard editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21380-21431 id: tiapkin22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21380 lastpage: 21431 published: 2022-06-28 00:00:00 +0000 - title: 'Nonparametric Sparse Tensor Factorization with Hierarchical Gamma Processes' abstract: 'We propose a nonparametric factorization approach for sparsely observed tensors. The sparsity does not mean zero-valued entries are massive or dominated. Rather, it implies the observed entries are very few, and even fewer with the growth of the tensor; this is ubiquitous in practice. Compared with the existent works, our model not only leverages the structural information underlying the observed entry indices, but also provides extra interpretability and flexibility {—} it can simultaneously estimate a set of location factors about the intrinsic properties of the tensor nodes, and another set of sociability factors reflecting their extrovert activity in interacting with others; users are free to choose a trade-off between the two types of factors. Specifically, we use hierarchical Gamma processes and Poisson random measures to construct a tensor-valued process, which can freely sample the two types of factors to generate tensors and always guarantees an asymptotic sparsity. We then normalize the tensor process to obtain hierarchical Dirichlet processes to sample each observed entry index, and use a Gaussian process to sample the entry value as a nonlinear function of the factors, so as to capture both the sparse structure properties and complex node relationships. For efficient inference, we use Dirichlet process properties over finite sample partitions, density transformations, and random features to develop a stochastic variational estimation algorithm. We demonstrate the advantage of our method in several benchmark datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/tillinghast22a.html PDF: https://proceedings.mlr.press/v162/tillinghast22a/tillinghast22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-tillinghast22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Conor family: Tillinghast - given: Zheng family: Wang - given: Shandian family: Zhe editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21432-21448 id: tillinghast22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21432 lastpage: 21448 published: 2022-06-28 00:00:00 +0000 - title: 'Deciphering Lasso-based Classification Through a Large Dimensional Analysis of the Iterative Soft-Thresholding Algorithm' abstract: 'This paper proposes a theoretical analysis of a Lasso-based classification algorithm. Leveraging on a realistic regime where the dimension of the data $p$ and their number $n$ are of the same order of magnitude, the theoretical classification error is derived as a function of the data statistics. As a result, insights into the functioning of the Lasso in classification and its differences with competing algorithms are highlighted. Our work is based on an original novel analysis of the Iterative Soft-Thresholding Algorithm (ISTA), which may be of independent interest beyond the particular problem studied here and may be adapted to similar iterative schemes. A theoretical optimization of the model’s hyperparameters is also provided, which allows for the data- and time-consuming cross-validation to be avoided. Finally, several applications on synthetic and real data are provided to validate the theoretical study and justify its impact in the design and understanding of algorithms of practical interest.' volume: 162 URL: https://proceedings.mlr.press/v162/tiomoko22a.html PDF: https://proceedings.mlr.press/v162/tiomoko22a/tiomoko22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-tiomoko22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Malik family: Tiomoko - given: Ekkehard family: Schnoor - given: Mohamed El Amine family: Seddik - given: Igor family: Colin - given: Aladin family: Virmaux editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21449-21477 id: tiomoko22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21449 lastpage: 21477 published: 2022-06-28 00:00:00 +0000 - title: 'Extended Unconstrained Features Model for Exploring Deep Neural Collapse' abstract: 'The modern strategy for training deep neural networks for classification tasks includes optimizing the network’s weights even after the training error vanishes to further push the training loss toward zero. Recently, a phenomenon termed “neural collapse" (NC) has been empirically observed in this training procedure. Specifically, it has been shown that the learned features (the output of the penultimate layer) of within-class samples converge to their mean, and the means of different classes exhibit a certain tight frame structure, which is also aligned with the last layer’s weights. Recent papers have shown that minimizers with this structure emerge when optimizing a simplified “unconstrained features model" (UFM) with a regularized cross-entropy loss. In this paper, we further analyze and extend the UFM. First, we study the UFM for the regularized MSE loss, and show that the minimizers’ features can have a more delicate structure than in the cross-entropy case. This affects also the structure of the weights. Then, we extend the UFM by adding another layer of weights as well as ReLU nonlinearity to the model and generalize our previous results. Finally, we empirically demonstrate the usefulness of our nonlinear extended UFM in modeling the NC phenomenon that occurs with practical networks.' volume: 162 URL: https://proceedings.mlr.press/v162/tirer22a.html PDF: https://proceedings.mlr.press/v162/tirer22a/tirer22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-tirer22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tom family: Tirer - given: Joan family: Bruna editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21478-21505 id: tirer22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21478 lastpage: 21505 published: 2022-06-28 00:00:00 +0000 - title: 'Object Permanence Emerges in a Random Walk along Memory' abstract: 'This paper proposes a self-supervised objective for learning representations that localize objects under occlusion - a property known as object permanence. A central question is the choice of learning signal in cases of total occlusion. Rather than directly supervising the locations of invisible objects, we propose a self-supervised objective that requires neither human annotation, nor assumptions about object dynamics. We show that object permanence can emerge by optimizing for temporal coherence of memory: we fit a Markov walk along a space-time graph of memories, where the states in each time step are non-Markovian features from a sequence encoder. This leads to a memory representation that stores occluded objects and predicts their motion, to better localize them. The resulting model outperforms existing approaches on several datasets of increasing complexity and realism, despite requiring minimal supervision, and hence being broadly applicable.' volume: 162 URL: https://proceedings.mlr.press/v162/tokmakov22a.html PDF: https://proceedings.mlr.press/v162/tokmakov22a/tokmakov22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-tokmakov22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Pavel family: Tokmakov - given: Allan family: Jabri - given: Jie family: Li - given: Adrien family: Gaidon editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21506-21519 id: tokmakov22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21506 lastpage: 21519 published: 2022-06-28 00:00:00 +0000 - title: 'Generic Coreset for Scalable Learning of Monotonic Kernels: Logistic Regression, Sigmoid and more' abstract: 'Coreset (or core-set) is a small weighted subset $Q$ of an input set $P$ with respect to a given monotonic function $f:\mathbb{R}\to\mathbb{R}$ that provably approximates its fitting loss $\sum_{p\in P}f(p\cdot x)$ to any given $x\in\mathbb{R}^d$. Using $Q$ we can obtain an approximation of $x^*$ that minimizes this loss, by running existing optimization algorithms on $Q$. In this work we provide: (i) A lower bound which proves that there are sets with no coresets smaller than $n=|P|$ for general monotonic loss functions. (ii) A proof that, with an additional common regularization term and under a natural assumption that holds e.g. for logistic regression and the sigmoid activation functions, a small coreset exists for any input $P$. (iii) A generic coreset construction algorithm that computes such a small coreset $Q$ in $O(nd+n\log n)$ time, and (iv) Experimental results with open-source code which demonstrate that our coresets are effective and are much smaller in practice than predicted in theory.' volume: 162 URL: https://proceedings.mlr.press/v162/tolochinksy22a.html PDF: https://proceedings.mlr.press/v162/tolochinksy22a/tolochinksy22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-tolochinksy22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Elad family: Tolochinksy - given: Ibrahim family: Jubran - given: Dan family: Feldman editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21520-21547 id: tolochinksy22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21520 lastpage: 21547 published: 2022-06-28 00:00:00 +0000 - title: 'Failure and success of the spectral bias prediction for Laplace Kernel Ridge Regression: the case of low-dimensional data' abstract: 'Recently, several theories including the replica method made predictions for the generalization error of Kernel Ridge Regression. In some regimes, they predict that the method has a ‘spectral bias’: decomposing the true function $f^*$ on the eigenbasis of the kernel, it fits well the coefficients associated with the O(P) largest eigenvalues, where $P$ is the size of the training set. This prediction works very well on benchmark data sets such as images, yet the assumptions these approaches make on the data are never satisfied in practice. To clarify when the spectral bias prediction holds, we first focus on a one-dimensional model where rigorous results are obtained and then use scaling arguments to generalize and test our findings in higher dimensions. Our predictions include the classification case $f(x)=$sign$(x_1)$ with a data distribution that vanishes at the decision boundary $p(x)\sim x_1^{\chi}$. For $\chi>0$ and a Laplace kernel, we find that (i) there exists a cross-over ridge $\lambda^*_{d,\chi}(P)\sim P^{-\frac{1}{d+\chi}}$ such that for $\lambda\gg \lambda^*_{d,\chi}(P)$, the replica method applies, but not for $\lambda\ll\lambda^*_{d,\chi}(P)$, (ii) in the ridge-less case, spectral bias predicts the correct training curve exponent only in the limit $d\rightarrow\infty$.' volume: 162 URL: https://proceedings.mlr.press/v162/tomasini22a.html PDF: https://proceedings.mlr.press/v162/tomasini22a/tomasini22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-tomasini22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Umberto M family: Tomasini - given: Antonio family: Sclocchi - given: Matthieu family: Wyart editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21548-21583 id: tomasini22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21548 lastpage: 21583 published: 2022-06-28 00:00:00 +0000 - title: 'Quantifying and Learning Linear Symmetry-Based Disentanglement' abstract: 'The definition of Linear Symmetry-Based Disentanglement (LSBD) formalizes the notion of linearly disentangled representations, but there is currently no metric to quantify LSBD. Such a metric is crucial to evaluate LSBD methods and to compare them to previous understandings of disentanglement. We propose D_LSBD, a mathematically sound metric to quantify LSBD, and provide a practical implementation for SO(2) groups. Furthermore, from this metric we derive LSBD-VAE, a semi-supervised method to learn LSBD representations. We demonstrate the utility of our metric by showing that (1) common VAE-based disentanglement methods don’t learn LSBD representations, (2) LSBD-VAE, as well as other recent methods, can learn LSBD representations needing only limited supervision on transformations, and (3) various desirable properties expressed by existing disentanglement metrics are also achieved by LSBD representations.' volume: 162 URL: https://proceedings.mlr.press/v162/tonnaer22a.html PDF: https://proceedings.mlr.press/v162/tonnaer22a/tonnaer22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-tonnaer22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Loek family: Tonnaer - given: Luis Armando Perez family: Rey - given: Vlado family: Menkovski - given: Mike family: Holenderski - given: Jim family: Portegies editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21584-21608 id: tonnaer22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21584 lastpage: 21608 published: 2022-06-28 00:00:00 +0000 - title: 'A Temporal-Difference Approach to Policy Gradient Estimation' abstract: 'The policy gradient theorem (Sutton et al., 2000) prescribes the usage of a cumulative discounted state distribution under the target policy to approximate the gradient. Most algorithms based on this theorem, in practice, break this assumption, introducing a distribution shift that can cause the convergence to poor solutions. In this paper, we propose a new approach of reconstructing the policy gradient from the start state without requiring a particular sampling strategy. The policy gradient calculation in this form can be simplified in terms of a gradient critic, which can be recursively estimated due to a new Bellman equation of gradients. By using temporal-difference updates of the gradient critic from an off-policy data stream, we develop the first estimator that side-steps the distribution shift issue in a model-free way. We prove that, under certain realizability conditions, our estimator is unbiased regardless of the sampling strategy. We empirically show that our technique achieves a superior bias-variance trade-off and performance in presence of off-policy samples.' volume: 162 URL: https://proceedings.mlr.press/v162/tosatto22a.html PDF: https://proceedings.mlr.press/v162/tosatto22a/tosatto22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-tosatto22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Samuele family: Tosatto - given: Andrew family: Patterson - given: Martha family: White - given: Rupam family: Mahmood editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21609-21632 id: tosatto22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21609 lastpage: 21632 published: 2022-06-28 00:00:00 +0000 - title: 'Simple and near-optimal algorithms for hidden stratification and multi-group learning' abstract: 'Multi-group agnostic learning is a formal learning criterion that is concerned with the conditional risks of predictors within subgroups of a population. The criterion addresses recent practical concerns such as subgroup fairness and hidden stratification. This paper studies the structure of solutions to the multi-group learning problem, and provides simple and near-optimal algorithms for the learning problem.' volume: 162 URL: https://proceedings.mlr.press/v162/tosh22a.html PDF: https://proceedings.mlr.press/v162/tosh22a/tosh22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-tosh22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Christopher J family: Tosh - given: Daniel family: Hsu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21633-21657 id: tosh22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21633 lastpage: 21657 published: 2022-06-28 00:00:00 +0000 - title: 'Design-Bench: Benchmarks for Data-Driven Offline Model-Based Optimization' abstract: 'Black-box model-based optimization (MBO) problems, where the goal is to find a design input that maximizes an unknown objective function, are ubiquitous in a wide range of domains, such as the design of proteins, DNA sequences, aircraft, and robots. Solving model-based optimization problems typically requires actively querying the unknown objective function on design proposals, which means physically building the candidate molecule, aircraft, or robot, testing it, and storing the result. This process can be expensive and time consuming, and one might instead prefer to optimize for the best design using only the data one already has. This setting—called offline MBO—poses substantial and different algorithmic challenges than more commonly studied online techniques. A number of recent works have demonstrated success with offline MBO for high-dimensional optimization problems using high-capacity deep neural networks. However, the lack of standardized benchmarks in this emerging field is making progress difficult to track. To address this, we present Design-Bench, a benchmark for offline MBO with a unified evaluation protocol and reference implementations of recent methods. Our benchmark includes a suite of diverse and realistic tasks derived from real-world optimization problems in biology, materials science, and robotics that present distinct challenges for offline MBO. Our benchmark and reference implementations are released at github.com/rail-berkeley/design-bench and github.com/rail-berkeley/design-baselines.' volume: 162 URL: https://proceedings.mlr.press/v162/trabucco22a.html PDF: https://proceedings.mlr.press/v162/trabucco22a/trabucco22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-trabucco22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Brandon family: Trabucco - given: Xinyang family: Geng - given: Aviral family: Kumar - given: Sergey family: Levine editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21658-21676 id: trabucco22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21658 lastpage: 21676 published: 2022-06-28 00:00:00 +0000 - title: 'AnyMorph: Learning Transferable Polices By Inferring Agent Morphology' abstract: 'The prototypical approach to reinforcement learning involves training policies tailored to a particular agent from scratch for every new morphology. Recent work aims to eliminate the re-training of policies by investigating whether a morphology-agnostic policy, trained on a diverse set of agents with similar task objectives, can be transferred to new agents with unseen morphologies without re-training. This is a challenging problem that required previous approaches to use hand-designed descriptions of the new agent’s morphology. Instead of hand-designing this description, we propose a data-driven method that learns a representation of morphology directly from the reinforcement learning objective. Ours is the first reinforcement learning algorithm that can train a policy to generalize to new agent morphologies without requiring a description of the agent’s morphology in advance. We evaluate our approach on the standard benchmark for agent-agnostic control, and improve over the current state of the art in zero-shot generalization to new agents. Importantly, our method attains good performance without an explicit description of morphology.' volume: 162 URL: https://proceedings.mlr.press/v162/trabucco22b.html PDF: https://proceedings.mlr.press/v162/trabucco22b/trabucco22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-trabucco22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Brandon family: Trabucco - given: Mariano family: Phielipp - given: Glen family: Berseth editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21677-21691 id: trabucco22b issued: date-parts: - 2022 - 6 - 28 firstpage: 21677 lastpage: 21691 published: 2022-06-28 00:00:00 +0000 - title: 'Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them' abstract: 'Making classifiers robust to adversarial examples is challenging. Thus, many works tackle the seemingly easier task of detecting perturbed inputs. We show a barrier towards this goal. We prove a hardness reduction between detection and classification of adversarial examples: given a robust detector for attacks at distance $\epsilon$ (in some metric), we show how to build a similarly robust (but inefficient) classifier for attacks at distance $\epsilon/2$. Our reduction is computationally inefficient, but preserves the data complexity of the original detector. The reduction thus cannot be directly used to build practical classifiers. Instead, it is a useful sanity check to test whether empirical detection results imply something much stronger than the authors presumably anticipated (namely a highly robust and data-efficient classifier). To illustrate, we revisit $14$ empirical detector defenses published over the past years. For $12/14$ defenses, we show that the claimed detection results imply an inefficient classifier with robustness far beyond the state-of-the-art— thus casting some doubts on the results’ validity. Finally, we show that our reduction applies in both directions: a robust classifier for attacks at distance $\epsilon/2$ implies an inefficient robust detector at distance $\epsilon$. Thus, we argue that robust classification and robust detection should be regarded as (near)-equivalent problems, if we disregard their computational complexity.' volume: 162 URL: https://proceedings.mlr.press/v162/tramer22a.html PDF: https://proceedings.mlr.press/v162/tramer22a/tramer22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-tramer22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Florian family: Tramer editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21692-21702 id: tramer22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21692 lastpage: 21702 published: 2022-06-28 00:00:00 +0000 - title: 'Nesterov Accelerated Shuffling Gradient Method for Convex Optimization' abstract: 'In this paper, we propose Nesterov Accelerated Shuffling Gradient (NASG), a new algorithm for the convex finite-sum minimization problems. Our method integrates the traditional Nesterov’s acceleration momentum with different shuffling sampling schemes. We show that our algorithm has an improved rate of $\Ocal(1/T)$ using unified shuffling schemes, where $T$ is the number of epochs. This rate is better than that of any other shuffling gradient methods in convex regime. Our convergence analysis does not require an assumption on bounded domain or a bounded gradient condition. For randomized shuffling schemes, we improve the convergence bound further. When employing some initial condition, we show that our method converges faster near the small neighborhood of the solution. Numerical simulations demonstrate the efficiency of our algorithm.' volume: 162 URL: https://proceedings.mlr.press/v162/tran22a.html PDF: https://proceedings.mlr.press/v162/tran22a/tran22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-tran22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Trang H family: Tran - given: Katya family: Scheinberg - given: Lam M family: Nguyen editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21703-21732 id: tran22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21703 lastpage: 21732 published: 2022-06-28 00:00:00 +0000 - title: 'A Completely Tuning-Free and Robust Approach to Sparse Precision Matrix Estimation' abstract: 'Despite the vast literature on sparse Gaussian graphical models, current methods either are asymptotically tuning-free (which still require fine-tuning in practice) or hinge on computationally expensive methods (e.g., cross-validation) to determine the proper level of regularization. We propose a completely tuning-free approach for estimating sparse Gaussian graphical models. Our method uses model-agnostic regularization parameters to estimate each column of the target precision matrix and enjoys several desirable properties. Computationally, our estimator can be computed efficiently by linear programming. Theoretically, the proposed estimator achieves minimax optimal convergence rates under various norms. We further propose a second-stage enhancement with non-convex penalties which possesses the strong oracle property. Through comprehensive numerical studies, our methods demonstrate favorable statistical performance. Remarkably, our methods exhibit strong robustness to the violation of the Gaussian assumption and significantly outperform competing methods in the heavy-tailed settings.' volume: 162 URL: https://proceedings.mlr.press/v162/tran22b.html PDF: https://proceedings.mlr.press/v162/tran22b/tran22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-tran22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Chau family: Tran - given: Guo family: Yu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21733-21750 id: tran22b issued: date-parts: - 2022 - 6 - 28 firstpage: 21733 lastpage: 21750 published: 2022-06-28 00:00:00 +0000 - title: 'Tackling covariate shift with node-based Bayesian neural networks' abstract: 'Bayesian neural networks (BNNs) promise improved generalization under covariate shift by providing principled probabilistic representations of epistemic uncertainty. However, weight-based BNNs often struggle with high computational complexity of large-scale architectures and datasets. Node-based BNNs have recently been introduced as scalable alternatives, which induce epistemic uncertainty by multiplying each hidden node with latent random variables, while learning a point-estimate of the weights. In this paper, we interpret these latent noise variables as implicit representations of simple and domain-agnostic data perturbations during training, producing BNNs that perform well under covariate shift due to input corruptions. We observe that the diversity of the implicit corruptions depends on the entropy of the latent variables, and propose a straightforward approach to increase the entropy of these variables during training. We evaluate the method on out-of-distribution image classification benchmarks, and show improved uncertainty estimation of node-based BNNs under covariate shift due to input perturbations. As a side effect, the method also provides robustness against noisy training labels.' volume: 162 URL: https://proceedings.mlr.press/v162/trinh22a.html PDF: https://proceedings.mlr.press/v162/trinh22a/trinh22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-trinh22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Trung Q family: Trinh - given: Markus family: Heinonen - given: Luigi family: Acerbi - given: Samuel family: Kaski editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21751-21775 id: trinh22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21751 lastpage: 21775 published: 2022-06-28 00:00:00 +0000 - title: 'Fenrir: Physics-Enhanced Regression for Initial Value Problems' abstract: 'We show how probabilistic numerics can be used to convert an initial value problem into a Gauss–Markov process parametrised by the dynamics of the initial value problem. Consequently, the often difficult problem of parameter estimation in ordinary differential equations is reduced to hyper-parameter estimation in Gauss–Markov regression, which tends to be considerably easier. The method’s relation and benefits in comparison to classical numerical integration and gradient matching approaches is elucidated. In particular, the method can, in contrast to gradient matching, handle partial observations, and has certain routes for escaping local optima not available to classical numerical integration. Experimental results demonstrate that the method is on par or moderately better than competing approaches.' volume: 162 URL: https://proceedings.mlr.press/v162/tronarp22a.html PDF: https://proceedings.mlr.press/v162/tronarp22a/tronarp22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-tronarp22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Filip family: Tronarp - given: Nathanael family: Bosch - given: Philipp family: Hennig editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21776-21794 id: tronarp22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21776 lastpage: 21794 published: 2022-06-28 00:00:00 +0000 - title: 'Interpretable Off-Policy Learning via Hyperbox Search' abstract: 'Personalized treatment decisions have become an integral part of modern medicine. Thereby, the aim is to make treatment decisions based on individual patient characteristics. Numerous methods have been developed for learning such policies from observational data that achieve the best outcome across a certain policy class. Yet these methods are rarely interpretable. However, interpretability is often a prerequisite for policy learning in clinical practice. In this paper, we propose an algorithm for interpretable off-policy learning via hyperbox search. In particular, our policies can be represented in disjunctive normal form (i.e., OR-of-ANDs) and are thus intelligible. We prove a universal approximation theorem that shows that our policy class is flexible enough to approximate any measurable function arbitrarily well. For optimization, we develop a tailored column generation procedure within a branch-and-bound framework. Using a simulation study, we demonstrate that our algorithm outperforms state-of-the-art methods from interpretable off-policy learning in terms of regret. Using real-word clinical data, we perform a user study with actual clinical experts, who rate our policies as highly interpretable.' volume: 162 URL: https://proceedings.mlr.press/v162/tschernutter22a.html PDF: https://proceedings.mlr.press/v162/tschernutter22a/tschernutter22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-tschernutter22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Daniel family: Tschernutter - given: Tobias family: Hatt - given: Stefan family: Feuerriegel editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21795-21827 id: tschernutter22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21795 lastpage: 21827 published: 2022-06-28 00:00:00 +0000 - title: 'FriendlyCore: Practical Differentially Private Aggregation' abstract: 'Differentially private algorithms for common metric aggregation tasks, such as clustering or averaging, often have limited practicality due to their complexity or to the large number of data points that is required for accurate results. We propose a simple and practical tool $\mathsf{FriendlyCore}$ that takes a set of points ${\cal D}$ from an unrestricted (pseudo) metric space as input. When ${\cal D}$ has effective diameter $r$, $\mathsf{FriendlyCore}$ returns a “stable” subset ${\cal C} \subseteq {\cal D}$ that includes all points, except possibly few outliers, and is guaranteed to have diameter $r$. $\mathsf{FriendlyCore}$ can be used to preprocess the input before privately aggregating it, potentially simplifying the aggregation or boosting its accuracy. Surprisingly, $\mathsf{FriendlyCore}$ is light-weight with no dependence on the dimension. We empirically demonstrate its advantages in boosting the accuracy of mean estimation and clustering tasks such as $k$-means and $k$-GMM, outperforming tailored methods.' volume: 162 URL: https://proceedings.mlr.press/v162/tsfadia22a.html PDF: https://proceedings.mlr.press/v162/tsfadia22a/tsfadia22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-tsfadia22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Eliad family: Tsfadia - given: Edith family: Cohen - given: Haim family: Kaplan - given: Yishay family: Mansour - given: Uri family: Stemmer editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21828-21863 id: tsfadia22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21828 lastpage: 21863 published: 2022-06-28 00:00:00 +0000 - title: 'Pairwise Conditional Gradients without Swap Steps and Sparser Kernel Herding' abstract: 'The Pairwise Conditional Gradients (PCG) algorithm is a powerful extension of the Frank-Wolfe algorithm leading to particularly sparse solutions, which makes PCG very appealing for problems such as sparse signal recovery, sparse regression, and kernel herding. Unfortunately, PCG exhibits so-called swap steps that might not provide sufficient primal progress. The number of these bad steps is bounded by a function in the dimension and as such known guarantees do not generalize to the infinite-dimensional case, which would be needed for kernel herding. We propose a new variant of PCG, the so-called Blended Pairwise Conditional Gradients (BPCG). This new algorithm does not exhibit any swap steps, is very easy to implement, and does not require any internal gradient alignment procedures. The convergence rate of BPCG is basically that of PCG if no drop steps would occur and as such is no worse than PCG but improves and provides new rates in many cases. Moreover, we observe in the numerical experiments that BPCG’s solutions are much sparser than those of PCG. We apply BPCG to the kernel herding setting, where we derive nice quadrature rules and provide numerical results demonstrating the performance of our method.' volume: 162 URL: https://proceedings.mlr.press/v162/tsuji22a.html PDF: https://proceedings.mlr.press/v162/tsuji22a/tsuji22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-tsuji22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Kazuma K family: Tsuji - given: Ken’Ichiro family: Tanaka - given: Sebastian family: Pokutta editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21864-21883 id: tsuji22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21864 lastpage: 21883 published: 2022-06-28 00:00:00 +0000 - title: 'Prototype Based Classification from Hierarchy to Fairness' abstract: 'Artificial neural nets can represent and classify many types of high-dimensional data but are often tailored to particular applications – e.g., for “fair” or “hierarchical” classification. Once an architecture has been selected, it is often difficult for humans to adjust models for a new task; for example, a hierarchical classifier cannot be easily transformed into a fair classifier that shields a protected field. Our contribution in this work is a new neural network architecture, the concept subspace network (CSN), which generalizes existing specialized classifiers to produce a unified model capable of learning a spectrum of multi-concept relationships. We demonstrate that CSNs reproduce state-of-the-art results in fair classification when enforcing concept independence, may be transformed into hierarchical classifiers, or may even reconcile fairness and hierarchy within a single classifier. The CSN is inspired by and matches the performance of existing prototype-based classifiers that promote interpretability.' volume: 162 URL: https://proceedings.mlr.press/v162/tucker22a.html PDF: https://proceedings.mlr.press/v162/tucker22a/tucker22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-tucker22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mycal family: Tucker - given: Julie A. family: Shah editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21884-21900 id: tucker22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21884 lastpage: 21900 published: 2022-06-28 00:00:00 +0000 - title: 'Consensus Multiplicative Weights Update: Learning to Learn using Projector-based Game Signatures' abstract: 'Cheung and Piliouras (2020) recently showed that two variants of the Multiplicative Weights Update method - OMWU and MWU - display opposite convergence properties depending on whether the game is zero-sum or cooperative. Inspired by this work and the recent literature on learning to optimize for single functions, we introduce a new framework for learning last-iterate convergence to Nash Equilibria in games, where the update rule’s coefficients (learning rates) along a trajectory are learnt by a reinforcement learning policy that is conditioned on the nature of the game: the game signature. We construct the latter using a new decomposition of two-player games into eight components corresponding to commutative projection operators, generalizing and unifying recent game concepts studied in the literature. We compare the performance of various update rules when their coefficients are learnt, and show that the RL policy is able to exploit the game signature across a wide range of game types. In doing so, we introduce CMWU, a new algorithm that extends consensus optimization to the constrained case, has local convergence guarantees for zero-sum bimatrix games, and show that it enjoys competitive performance on both zero-sum games with constant coefficients and across a spectrum of games when its coefficients are learnt.' volume: 162 URL: https://proceedings.mlr.press/v162/vadori22a.html PDF: https://proceedings.mlr.press/v162/vadori22a/vadori22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-vadori22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Nelson family: Vadori - given: Rahul family: Savani - given: Thomas family: Spooner - given: Sumitra family: Ganesh editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21901-21926 id: vadori22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21901 lastpage: 21926 published: 2022-06-28 00:00:00 +0000 - title: 'Self-Supervised Models of Audio Effectively Explain Human Cortical Responses to Speech' abstract: 'Self-supervised language models are very effective at predicting high-level cortical responses during language comprehension. However, the best current models of lower-level auditory processing in the human brain rely on either hand-constructed acoustic filters or representations from supervised audio neural networks. In this work, we capitalize on the progress of self-supervised speech representation learning (SSL) to create new state-of-the-art models of the human auditory system. Compared against acoustic baselines, phonemic features, and supervised models, representations from the middle layers of self-supervised models (APC, wav2vec, wav2vec 2.0, and HuBERT) consistently yield the best prediction performance for fMRI recordings within the auditory cortex (AC). Brain areas involved in low-level auditory processing exhibit a preference for earlier SSL model layers, whereas higher-level semantic areas prefer later layers. We show that these trends are due to the models’ ability to encode information at multiple linguistic levels (acoustic, phonetic, and lexical) along their representation depth. Overall, these results show that self-supervised models effectively capture the hierarchy of information relevant to different stages of speech processing in human cortex.' volume: 162 URL: https://proceedings.mlr.press/v162/vaidya22a.html PDF: https://proceedings.mlr.press/v162/vaidya22a/vaidya22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-vaidya22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Aditya R family: Vaidya - given: Shailee family: Jain - given: Alexander family: Huth editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21927-21944 id: vaidya22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21927 lastpage: 21944 published: 2022-06-28 00:00:00 +0000 - title: 'Path-Gradient Estimators for Continuous Normalizing Flows' abstract: 'Recent work has established a path-gradient estimator for simple variational Gaussian distributions and has argued that the path-gradient is particularly beneficial in the regime in which the variational distribution approaches the exact target distribution. In many applications, this regime can however not be reached by a simple Gaussian variational distribution. In this work, we overcome this crucial limitation by proposing a path-gradient estimator for the considerably more expressive variational family of continuous normalizing flows. We outline an efficient algorithm to calculate this estimator and establish its superior performance empirically.' volume: 162 URL: https://proceedings.mlr.press/v162/vaitl22a.html PDF: https://proceedings.mlr.press/v162/vaitl22a/vaitl22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-vaitl22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Lorenz family: Vaitl - given: Kim Andrea family: Nicoli - given: Shinichi family: Nakajima - given: Pan family: Kessel editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21945-21959 id: vaitl22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21945 lastpage: 21959 published: 2022-06-28 00:00:00 +0000 - title: 'Improved Convergence Rates for Sparse Approximation Methods in Kernel-Based Learning' abstract: 'Kernel-based models such as kernel ridge regression and Gaussian processes are ubiquitous in machine learning applications for regression and optimization. It is well known that a major downside for kernel-based models is the high computational cost; given a dataset of $n$ samples, the cost grows as $\mathcal{O}(n^3)$. Existing sparse approximation methods can yield a significant reduction in the computational cost, effectively reducing the actual cost down to as low as $\mathcal{O}(n)$ in certain cases. Despite this remarkable empirical success, significant gaps remain in the existing results for the analytical bounds on the error due to approximation. In this work, we provide novel confidence intervals for the Nyström method and the sparse variational Gaussian process approximation method, which we establish using novel interpretations of the approximate (surrogate) posterior variance of the models. Our confidence intervals lead to improved performance bounds in both regression and optimization problems.' volume: 162 URL: https://proceedings.mlr.press/v162/vakili22a.html PDF: https://proceedings.mlr.press/v162/vakili22a/vakili22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-vakili22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sattar family: Vakili - given: Jonathan family: Scarlett - given: Da-Shan family: Shiu - given: Alberto family: Bernacchia editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21960-21983 id: vakili22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21960 lastpage: 21983 published: 2022-06-28 00:00:00 +0000 - title: 'EDEN: Communication-Efficient and Robust Distributed Mean Estimation for Federated Learning' abstract: 'Distributed Mean Estimation (DME) is a central building block in federated learning, where clients send local gradients to a parameter server for averaging and updating the model. Due to communication constraints, clients often use lossy compression techniques to compress the gradients, resulting in estimation inaccuracies. DME is more challenging when clients have diverse network conditions, such as constrained communication budgets and packet losses. In such settings, DME techniques often incur a significant increase in the estimation error leading to degraded learning performance. In this work, we propose a robust DME technique named EDEN that naturally handles heterogeneous communication budgets and packet losses. We derive appealing theoretical guarantees for EDEN and evaluate it empirically. Our results demonstrate that EDEN consistently improves over state-of-the-art DME techniques.' volume: 162 URL: https://proceedings.mlr.press/v162/vargaftik22a.html PDF: https://proceedings.mlr.press/v162/vargaftik22a/vargaftik22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-vargaftik22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Shay family: Vargaftik - given: Ran Ben family: Basat - given: Amit family: Portnoy - given: Gal family: Mendelson - given: Yaniv Ben family: Itzhak - given: Michael family: Mitzenmacher editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 21984-22014 id: vargaftik22a issued: date-parts: - 2022 - 6 - 28 firstpage: 21984 lastpage: 22014 published: 2022-06-28 00:00:00 +0000 - title: 'Towards Noise-adaptive, Problem-adaptive (Accelerated) Stochastic Gradient Descent' abstract: 'We aim to make stochastic gradient descent (SGD) adaptive to (i) the noise $\sigma^2$ in the stochastic gradients and (ii) problem-dependent constants. When minimizing smooth, strongly-convex functions with condition number $\kappa$, we prove that $T$ iterations of SGD with exponentially decreasing step-sizes and knowledge of the smoothness can achieve an $\tilde{O} \left(\exp \left( \nicefrac{-T}{\kappa} \right) + \nicefrac{\sigma^2}{T} \right)$ rate, without knowing $\sigma^2$. In order to be adaptive to the smoothness, we use a stochastic line-search (SLS) and show (via upper and lower-bounds) that SGD with SLS converges at the desired rate, but only to a neighbourhood of the solution. On the other hand, we prove that SGD with an offline estimate of the smoothness converges to the minimizer. However, its rate is slowed down proportional to the estimation error. Next, we prove that SGD with Nesterov acceleration and exponential step-sizes (referred to as ASGD) can achieve the near-optimal $\tilde{O} \left(\exp \left( \nicefrac{-T}{\sqrt{\kappa}} \right) + \nicefrac{\sigma^2}{T} \right)$ rate, without knowledge of $\sigma^2$. When used with offline estimates of the smoothness and strong-convexity, ASGD still converges to the solution, albeit at a slower rate. Finally, we empirically demonstrate the effectiveness of exponential step-sizes coupled with a novel variant of SLS.' volume: 162 URL: https://proceedings.mlr.press/v162/vaswani22a.html PDF: https://proceedings.mlr.press/v162/vaswani22a/vaswani22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-vaswani22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sharan family: Vaswani - given: Benjamin family: Dubois-Taine - given: Reza family: Babanezhad editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22015-22059 id: vaswani22a issued: date-parts: - 2022 - 6 - 28 firstpage: 22015 lastpage: 22059 published: 2022-06-28 00:00:00 +0000 - title: 'Correlation Clustering via Strong Triadic Closure Labeling: Fast Approximation Algorithms and Practical Lower Bounds' abstract: 'Correlation clustering is a widely studied framework for clustering based on pairwise similarity and dissimilarity scores, but its best approximation algorithms rely on impractical linear programming relaxations. We present faster approximation algorithms that avoid these relaxations, for two well-studied special cases: cluster editing and cluster deletion. We accomplish this by drawing new connections to edge labeling problems related to the principle of strong triadic closure. This leads to faster and more practical linear programming algorithms, as well as extremely scalable combinatorial techniques, including the first combinatorial approximation algorithm for cluster deletion. In practice, our algorithms produce approximate solutions that nearly match the best algorithms in quality, while scaling to problems that are orders of magnitude larger.' volume: 162 URL: https://proceedings.mlr.press/v162/veldt22a.html PDF: https://proceedings.mlr.press/v162/veldt22a/veldt22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-veldt22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Nate family: Veldt editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22060-22083 id: veldt22a issued: date-parts: - 2022 - 6 - 28 firstpage: 22060 lastpage: 22083 published: 2022-06-28 00:00:00 +0000 - title: 'The CLRS Algorithmic Reasoning Benchmark' abstract: 'Learning representations of algorithms is an emerging area of machine learning, seeking to bridge concepts from neural networks with classical algorithms. Several important works have investigated whether neural networks can effectively reason like algorithms, typically by learning to execute them. The common trend in the area, however, is to generate targeted kinds of algorithmic data to evaluate specific hypotheses, making results hard to transfer across publications, and increasing the barrier of entry. To consolidate progress and work towards unified evaluation, we propose the CLRS Algorithmic Reasoning Benchmark, covering classical algorithms from the Introduction to Algorithms textbook. Our benchmark spans a variety of algorithmic reasoning procedures, including sorting, searching, dynamic programming, graph algorithms, string algorithms and geometric algorithms. We perform extensive experiments to demonstrate how several popular algorithmic reasoning baselines perform on these tasks, and consequently, highlight links to several open challenges. Our library is readily available at https://github.com/deepmind/clrs.' volume: 162 URL: https://proceedings.mlr.press/v162/velickovic22a.html PDF: https://proceedings.mlr.press/v162/velickovic22a/velickovic22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-velickovic22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Petar family: Veličković - given: Adrià Puigdomènech family: Badia - given: David family: Budden - given: Razvan family: Pascanu - given: Andrea family: Banino - given: Misha family: Dashevskiy - given: Raia family: Hadsell - given: Charles family: Blundell editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22084-22102 id: velickovic22a issued: date-parts: - 2022 - 6 - 28 firstpage: 22084 lastpage: 22102 published: 2022-06-28 00:00:00 +0000 - title: 'Bregman Power k-Means for Clustering Exponential Family Data' abstract: 'Recent progress in center-based clustering algorithms combats poor local minima by implicit annealing through a family of generalized means. These methods are variations of Lloyd’s celebrated k-means algorithm, and are most appropriate for spherical clusters such as those arising from Gaussian data. In this paper, we bridge these new algorithmic advances to classical work on hard clustering under Bregman divergences, which enjoy a bijection to exponential family distributions and are thus well-suited for clustering objects arising from a breadth of data generating mechanisms. The elegant properties of Bregman divergences allow us to maintain closed form updates in a simple and transparent algorithm, and moreover lead to new theoretical arguments for establishing finite sample bounds that relax the bounded support assumption made in the existing state of the art. Additionally, we consider thorough empirical analyses on simulated experiments and a case study on rainfall data, finding that the proposed method outperforms existing peer methods in a variety of non-Gaussian data settings.' volume: 162 URL: https://proceedings.mlr.press/v162/vellal22a.html PDF: https://proceedings.mlr.press/v162/vellal22a/vellal22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-vellal22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Adithya family: Vellal - given: Saptarshi family: Chakraborty - given: Jason Q family: Xu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22103-22119 id: vellal22a issued: date-parts: - 2022 - 6 - 28 firstpage: 22103 lastpage: 22119 published: 2022-06-28 00:00:00 +0000 - title: 'Estimation in Rotationally Invariant Generalized Linear Models via Approximate Message Passing' abstract: 'We consider the problem of signal estimation in generalized linear models defined via rotationally invariant design matrices. Since these matrices can have an arbitrary spectral distribution, this model is well suited for capturing complex correlation structures which often arise in applications. We propose a novel family of approximate message passing (AMP) algorithms for signal estimation, and rigorously characterize their performance in the high-dimensional limit via a state evolution recursion. Our rotationally invariant AMP has complexity of the same order as the existing AMP derived under the restrictive assumption of a Gaussian design; our algorithm also recovers this existing AMP as a special case. Numerical results showcase a performance close to Vector AMP (which is conjectured to be Bayes-optimal in some settings), but obtained with a much lower complexity, as the proposed algorithm does not require a computationally expensive singular value decomposition.' volume: 162 URL: https://proceedings.mlr.press/v162/venkataramanan22a.html PDF: https://proceedings.mlr.press/v162/venkataramanan22a/venkataramanan22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-venkataramanan22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ramji family: Venkataramanan - given: Kevin family: Kögler - given: Marco family: Mondelli editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22120-22144 id: venkataramanan22a issued: date-parts: - 2022 - 6 - 28 firstpage: 22120 lastpage: 22144 published: 2022-06-28 00:00:00 +0000 - title: 'Bayesian Optimization under Stochastic Delayed Feedback' abstract: 'Bayesian optimization (BO) is a widely-used sequential method for zeroth-order optimization of complex and expensive-to-compute black-box functions. The existing BO methods assume that the function evaluation (feedback) is available to the learner immediately or after a fixed delay. Such assumptions may not be practical in many real-life problems like online recommendations, clinical trials, and hyperparameter tuning where feedback is available after a random delay. To benefit from the experimental parallelization in these problems, the learner needs to start new function evaluations without waiting for delayed feedback. In this paper, we consider the BO under stochastic delayed feedback problem. We propose algorithms with sub-linear regret guarantees that efficiently address the dilemma of selecting new function queries while waiting for randomly delayed feedback. Building on our results, we also make novel contributions to batch BO and contextual Gaussian process bandits. Experiments on synthetic and real-life datasets verify the performance of our algorithms.' volume: 162 URL: https://proceedings.mlr.press/v162/verma22a.html PDF: https://proceedings.mlr.press/v162/verma22a/verma22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-verma22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Arun family: Verma - given: Zhongxiang family: Dai - given: Bryan Kian Hsiang family: Low editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22145-22167 id: verma22a issued: date-parts: - 2022 - 6 - 28 firstpage: 22145 lastpage: 22167 published: 2022-06-28 00:00:00 +0000 - title: 'VarScene: A Deep Generative Model for Realistic Scene Graph Synthesis' abstract: 'Scene graphs are powerful abstractions that capture relationships between objects in images by modeling objects as nodes and relationships as edges. Generation of realistic synthetic scene graphs has applications like scene synthesis and data augmentation for supervised learning. Existing graph generative models are predominantly targeted toward molecular graphs, leveraging the limited vocabulary of atoms and bonds and also the well-defined semantics of chemical compounds. In contrast, scene graphs have much larger object and relation vocabularies, and their semantics are latent. To address this challenge, we propose a variational autoencoder for scene graphs, which is optimized for the maximum mean discrepancy (MMD) between the ground truth scene graph distribution and distribution of the generated scene graphs. Our method views a scene graph as a collection of star graphs and encodes it into a latent representation of the underlying stars. The decoder generates scene graphs by learning to sample the component stars and edges between them. Our experiments show that our method is able to mimic the underlying scene graph generative process more accurately than several state-of-the-art baselines.' volume: 162 URL: https://proceedings.mlr.press/v162/verma22b.html PDF: https://proceedings.mlr.press/v162/verma22b/verma22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-verma22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tathagat family: Verma - given: Abir family: De - given: Yateesh family: Agrawal - given: Vishwa family: Vinay - given: Soumen family: Chakrabarti editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22168-22183 id: verma22b issued: date-parts: - 2022 - 6 - 28 firstpage: 22168 lastpage: 22183 published: 2022-06-28 00:00:00 +0000 - title: 'Calibrated Learning to Defer with One-vs-All Classifiers' abstract: 'The learning to defer (L2D) framework has the potential to make AI systems safer. For a given input, the system can defer the decision to a human if the human is more likely than the model to take the correct action. We study the calibration of L2D systems, investigating if the probabilities they output are sound. We find that Mozannar & Sontag’s (2020) multiclass framework is not calibrated with respect to expert correctness. Moreover, it is not even guaranteed to produce valid probabilities due to its parameterization being degenerate for this purpose. We propose an L2D system based on one-vs-all classifiers that is able to produce calibrated probabilities of expert correctness. Furthermore, our loss function is also a consistent surrogate for multiclass L2D, like Mozannar & Sontag’s (2020). Our experiments verify that not only is our system calibrated, but this benefit comes at no cost to accuracy. Our model’s accuracy is always comparable (and often superior) to Mozannar & Sontag’s (2020) model’s in tasks ranging from hate speech detection to galaxy classification to diagnosis of skin lesions.' volume: 162 URL: https://proceedings.mlr.press/v162/verma22c.html PDF: https://proceedings.mlr.press/v162/verma22c/verma22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-verma22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Rajeev family: Verma - given: Eric family: Nalisnick editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22184-22202 id: verma22c issued: date-parts: - 2022 - 6 - 28 firstpage: 22184 lastpage: 22202 published: 2022-06-28 00:00:00 +0000 - title: 'Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation' abstract: 'We propose an algorithm that uses linear function approximation (LFA) for stochastic shortest path (SSP). Under minimal assumptions, it obtains sublinear regret, is computationally efficient, and uses stationary policies. To our knowledge, this is the first such algorithm in the LFA literature (for SSP or other formulations). Our algorithm is a special case of a more general one, which achieves regret square root in the number of episodes given access to a computation oracle.' volume: 162 URL: https://proceedings.mlr.press/v162/vial22a.html PDF: https://proceedings.mlr.press/v162/vial22a/vial22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-vial22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Daniel family: Vial - given: Advait family: Parulekar - given: Sanjay family: Shakkottai - given: R family: Srikant editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22203-22233 id: vial22a issued: date-parts: - 2022 - 6 - 28 firstpage: 22203 lastpage: 22233 published: 2022-06-28 00:00:00 +0000 - title: 'On Implicit Bias in Overparameterized Bilevel Optimization' abstract: 'Many problems in machine learning involve bilevel optimization (BLO), including hyperparameter optimization, meta-learning, and dataset distillation. Bilevel problems involve inner and outer parameters, each optimized for its own objective. Often, at least one of the two levels is underspecified and there are multiple ways to choose among equivalent optima. Inspired by recent studies of the implicit bias induced by optimization algorithms in single-level optimization, we investigate the implicit bias of different gradient-based algorithms for jointly optimizing the inner and outer parameters. We delineate two standard BLO methods—cold-start and warm-start BLO—and show that the converged solution or long-run behavior depends to a large degree on these and other algorithmic choices, such as the hypergradient approximation. We also show that the solutions from warm-start BLO can encode a surprising amount of information about the outer objective, even when the outer optimization variables are low-dimensional. We believe that implicit bias deserves as central a role in the study of bilevel optimization as it has attained in the study of single-level neural net optimization.' volume: 162 URL: https://proceedings.mlr.press/v162/vicol22a.html PDF: https://proceedings.mlr.press/v162/vicol22a/vicol22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-vicol22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Paul family: Vicol - given: Jonathan P family: Lorraine - given: Fabian family: Pedregosa - given: David family: Duvenaud - given: Roger B family: Grosse editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22234-22259 id: vicol22a issued: date-parts: - 2022 - 6 - 28 firstpage: 22234 lastpage: 22259 published: 2022-06-28 00:00:00 +0000 - title: 'Multiclass learning with margin: exponential rates with no bias-variance trade-off' abstract: 'We study the behavior of error bounds for multiclass classification under suitable margin conditions. For a wide variety of methods we prove that the classification error under a hard-margin condition decreases exponentially fast without any bias-variance trade-off. Different convergence rates can be obtained in correspondence of different margin assumptions. With a self-contained and instructive analysis we are able to generalize known results from the binary to the multiclass setting.' volume: 162 URL: https://proceedings.mlr.press/v162/vigogna22a.html PDF: https://proceedings.mlr.press/v162/vigogna22a/vigogna22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-vigogna22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Stefano family: Vigogna - given: Giacomo family: Meanti - given: Ernesto family: De Vito - given: Lorenzo family: Rosasco editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22260-22269 id: vigogna22a issued: date-parts: - 2022 - 6 - 28 firstpage: 22260 lastpage: 22269 published: 2022-06-28 00:00:00 +0000 - title: 'Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning' abstract: 'Impressive results in natural language processing (NLP) based on the Transformer neural network architecture have inspired researchers to explore viewing offline reinforcement learning (RL) as a generic sequence modeling problem. Recent works based on this paradigm have achieved state-of-the-art results in several of the mostly deterministic offline Atari and D4RL benchmarks. However, because these methods jointly model the states and actions as a single sequencing problem, they struggle to disentangle the effects of the policy and world dynamics on the return. Thus, in adversarial or stochastic environments, these methods lead to overly optimistic behavior that can be dangerous in safety-critical systems like autonomous driving. In this work, we propose a method that addresses this optimism bias by explicitly disentangling the policy and world models, which allows us at test time to search for policies that are robust to multiple possible futures in the environment. We demonstrate our method’s superior performance on a variety of autonomous driving tasks in simulation.' volume: 162 URL: https://proceedings.mlr.press/v162/villaflor22a.html PDF: https://proceedings.mlr.press/v162/villaflor22a/villaflor22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-villaflor22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Adam R family: Villaflor - given: Zhe family: Huang - given: Swapnil family: Pande - given: John M family: Dolan - given: Jeff family: Schneider editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22270-22283 id: villaflor22a issued: date-parts: - 2022 - 6 - 28 firstpage: 22270 lastpage: 22283 published: 2022-06-28 00:00:00 +0000 - title: 'Bayesian Nonparametrics for Offline Skill Discovery' abstract: 'Skills or low-level policies in reinforcement learning are temporally extended actions that can speed up learning and enable complex behaviours. Recent work in offline reinforcement learning and imitation learning has proposed several techniques for skill discovery from a set of expert trajectories. While these methods are promising, the number K of skills to discover is always a fixed hyperparameter, which requires either prior knowledge about the environment or an additional parameter search to tune it. We first propose a method for offline learning of options (a particular skill framework) exploiting advances in variational inference and continuous relaxations. We then highlight an unexplored connection between Bayesian nonparametrics and offline skill discovery, and show how to obtain a nonparametric version of our model. This version is tractable thanks to a carefully structured approximate posterior with a dynamically-changing number of options, removing the need to specify K. We also show how our nonparametric extension can be applied in other skill frameworks, and empirically demonstrate that our method can outperform state-of-the-art offline skill learning algorithms across a variety of environments.' volume: 162 URL: https://proceedings.mlr.press/v162/villecroze22a.html PDF: https://proceedings.mlr.press/v162/villecroze22a/villecroze22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-villecroze22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Valentin family: Villecroze - given: Harry family: Braviner - given: Panteha family: Naderian - given: Chris family: Maddison - given: Gabriel family: Loaiza-Ganem editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22284-22299 id: villecroze22a issued: date-parts: - 2022 - 6 - 28 firstpage: 22284 lastpage: 22299 published: 2022-06-28 00:00:00 +0000 - title: 'Hermite Polynomial Features for Private Data Generation' abstract: 'Kernel mean embedding is a useful tool to compare probability measures. Despite its usefulness, kernel mean embedding considers infinite-dimensional features, which are challenging to handle in the context of differentially private data generation. A recent work, DP-MERF (Harder et al., 2021), proposes to approximate the kernel mean embedding of data distribution using finite-dimensional random features, which yields an analytically tractable sensitivity of approximate kernel mean embedding. However, the required number of random features in DP-MERF is excessively high, often ten thousand to a hundred thousand, which worsens the sensitivity of the approximate kernel mean embedding. To improve the sensitivity, we propose to replace random features with Hermite polynomial features. Unlike the random features, the Hermite polynomial features are ordered, where the features at the low orders contain more information on the distribution than those at the high orders. Hence, a relatively low order of Hermite polynomial features can more accurately approximate the mean embedding of the data distribution compared to a significantly higher number of random features. As a result, the Hermite polynomial features help us to improve the privacy-accuracy trade-off compared to DP-MERF, as demonstrated on several heterogeneous tabular datasets, as well as several image benchmark datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/vinaroz22a.html PDF: https://proceedings.mlr.press/v162/vinaroz22a/vinaroz22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-vinaroz22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Margarita family: Vinaroz - given: Mohammad-Amin family: Charusaie - given: Frederik family: Harder - given: Kamil family: Adamczewski - given: Mi Jung family: Park editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22300-22324 id: vinaroz22a issued: date-parts: - 2022 - 6 - 28 firstpage: 22300 lastpage: 22324 published: 2022-06-28 00:00:00 +0000 - title: 'What Can Linear Interpolation of Neural Network Loss Landscapes Tell Us?' abstract: 'Studying neural network loss landscapes provides insights into the nature of the underlying optimization problems. Unfortunately, loss landscapes are notoriously difficult to visualize in a human-comprehensible fashion. One common way to address this problem is to plot linear slices of the landscape, for example from the initial state of the network to the final state after optimization. On the basis of this analysis, prior work has drawn broader conclusions about the difficulty of the optimization problem. In this paper, we put inferences of this kind to the test, systematically evaluating how linear interpolation and final performance vary when altering the data, choice of initialization, and other optimizer and architecture design choices. Further, we use linear interpolation to study the role played by individual layers and substructures of the network. We find that certain layers are more sensitive to the choice of initialization, but that the shape of the linear path is not indicative of the changes in test accuracy of the model. Our results cast doubt on the broader intuition that the presence or absence of barriers when interpolating necessarily relates to the success of optimization.' volume: 162 URL: https://proceedings.mlr.press/v162/vlaar22a.html PDF: https://proceedings.mlr.press/v162/vlaar22a/vlaar22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-vlaar22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tiffany J family: Vlaar - given: Jonathan family: Frankle editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22325-22341 id: vlaar22a issued: date-parts: - 2022 - 6 - 28 firstpage: 22325 lastpage: 22341 published: 2022-06-28 00:00:00 +0000 - title: 'Multirate Training of Neural Networks' abstract: 'We propose multirate training of neural networks: partitioning neural network parameters into "fast" and "slow" parts which are trained on different time scales, where slow parts are updated less frequently. By choosing appropriate partitionings we can obtain substantial computational speed-up for transfer learning tasks. We show for applications in vision and NLP that we can fine-tune deep neural networks in almost half the time, without reducing the generalization performance of the resulting models. We analyze the convergence properties of our multirate scheme and draw a comparison with vanilla SGD. We also discuss splitting choices for the neural network parameters which could enhance generalization performance when neural networks are trained from scratch. A multirate approach can be used to learn different features present in the data and as a form of regularization. Our paper unlocks the potential of using multirate techniques for neural network training and provides several starting points for future work in this area.' volume: 162 URL: https://proceedings.mlr.press/v162/vlaar22b.html PDF: https://proceedings.mlr.press/v162/vlaar22b/vlaar22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-vlaar22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tiffany J family: Vlaar - given: Benedict family: Leimkuhler editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22342-22360 id: vlaar22b issued: date-parts: - 2022 - 6 - 28 firstpage: 22342 lastpage: 22360 published: 2022-06-28 00:00:00 +0000 - title: 'Provably Adversarially Robust Nearest Prototype Classifiers' abstract: 'Nearest prototype classifiers (NPCs) assign to each input point the label of the nearest prototype with respect to a chosen distance metric. A direct advantage of NPCs is that the decisions are interpretable. Previous work could provide lower bounds on the minimal adversarial perturbation in the $\ell_p$-threat model when using the same $\ell_p$-distance for the NPCs. In this paper we provide a complete discussion on the complexity when using $\ell_p$-distances for decision and $\ell_q$-threat models for certification for $p,q \in \{1,2,\infty\}$. In particular we provide scalable algorithms for the exact computation of the minimal adversarial perturbation when using $\ell_2$-distance and improved lower bounds in other cases. Using efficient improved lower bounds we train our \textbf{P}rovably adversarially robust \textbf{NPC} (PNPC), for MNIST which have better $\ell_2$-robustness guarantees than neural networks. Additionally, we show up to our knowledge the first certification results w.r.t. to the LPIPS perceptual metric which has been argued to be a more realistic threat model for image classification than $\ell_p$-balls. Our PNPC has on CIFAR10 higher certified robust accuracy than the empirical robust accuracy reported in \cite{laidlaw2021perceptual}. The code is available in our \href{https://github.com/vvoracek/Provably-Adversarially-Robust-Nearest-Prototype-Classifiers}{repository}.' volume: 162 URL: https://proceedings.mlr.press/v162/voracek22a.html PDF: https://proceedings.mlr.press/v162/voracek22a/voracek22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-voracek22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Václav family: Voráček - given: Matthias family: Hein editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22361-22383 id: voracek22a issued: date-parts: - 2022 - 6 - 28 firstpage: 22361 lastpage: 22383 published: 2022-06-28 00:00:00 +0000 - title: 'First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach' abstract: 'Obtaining first-order regret bounds—regret bounds scaling not as the worst-case but with some measure of the performance of the optimal policy on a given instance—is a core question in sequential decision-making. While such bounds exist in many settings, they have proven elusive in reinforcement learning with large state spaces. In this work we address this gap, and show that it is possible to obtain regret scaling as $\widetilde{\mathcal{O}}(\sqrt{d^3 H^3 \cdot V_1^\star \cdot K} + d^{3.5}H^3\log K )$ in reinforcement learning with large state spaces, namely the linear MDP setting. Here $V_1^\star$ is the value of the optimal policy and $K$ is the number of episodes. We demonstrate that existing techniques based on least squares estimation are insufficient to obtain this result, and instead develop a novel robust self-normalized concentration bound based on the robust Catoni mean estimator, which may be of independent interest.' volume: 162 URL: https://proceedings.mlr.press/v162/wagenmaker22a.html PDF: https://proceedings.mlr.press/v162/wagenmaker22a/wagenmaker22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wagenmaker22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Andrew J family: Wagenmaker - given: Yifang family: Chen - given: Max family: Simchowitz - given: Simon family: Du - given: Kevin family: Jamieson editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22384-22429 id: wagenmaker22a issued: date-parts: - 2022 - 6 - 28 firstpage: 22384 lastpage: 22429 published: 2022-06-28 00:00:00 +0000 - title: 'Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes' abstract: 'Reward-free reinforcement learning (RL) considers the setting where the agent does not have access to a reward function during exploration, but must propose a near-optimal policy for an arbitrary reward function revealed only after exploring. In the the tabular setting, it is well known that this is a more difficult problem than reward-aware (PAC) RL—where the agent has access to the reward function during exploration—with optimal sample complexities in the two settings differing by a factor of $|\mathcal{S}|$, the size of the state space. We show that this separation does not exist in the setting of linear MDPs. We first develop a computationally efficient algorithm for reward-free RL in a $d$-dimensional linear MDP with sample complexity scaling as $\widetilde{\mathcal{O}}(d^2 H^5/\epsilon^2)$. We then show a lower bound with matching dimension-dependence of $\Omega(d^2 H^2/\epsilon^2)$, which holds for the reward-aware RL setting. To our knowledge, our approach is the first computationally efficient algorithm to achieve optimal $d$ dependence in linear MDPs, even in the single-reward PAC setting. Our algorithm relies on a novel procedure which efficiently traverses a linear MDP, collecting samples in any given “feature direction”, and enjoys a sample complexity scaling optimally in the (linear MDP equivalent of the) maximal state visitation probability. We show that this exploration procedure can also be applied to solve the problem of obtaining “well-conditioned” covariates in linear MDPs.' volume: 162 URL: https://proceedings.mlr.press/v162/wagenmaker22b.html PDF: https://proceedings.mlr.press/v162/wagenmaker22b/wagenmaker22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wagenmaker22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Andrew J family: Wagenmaker - given: Yifang family: Chen - given: Max family: Simchowitz - given: Simon family: Du - given: Kevin family: Jamieson editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22430-22456 id: wagenmaker22b issued: date-parts: - 2022 - 6 - 28 firstpage: 22430 lastpage: 22456 published: 2022-06-28 00:00:00 +0000 - title: 'Training Characteristic Functions with Reinforcement Learning: XAI-methods play Connect Four' abstract: 'Characteristic functions (from cooperative game theory) are able to evaluate partial inputs and form the basis for attribution methods like Shapley values. These attribution methods allow us to measure how important each input component is for the function output—one of the goals of explainable AI (XAI). Given a standard classifier function, it is unclear how partial input should be realised. Instead, most XAI-methods for black-box classifiers like neural networks consider counterfactual inputs that generally lie off-manifold, which makes them hard to evaluate and easy to manipulate. We propose a setup to directly train characteristic functions in the form of neural networks to play simple two-player games. We apply this to the game of Connect Four by randomly hiding colour information from our agents during training. This has three advantages for comparing XAI-methods: It alleviates the ambiguity about how to realise partial input, makes off-manifold evaluation unnecessary and allows us to compare the methods by letting them play against each other.' volume: 162 URL: https://proceedings.mlr.press/v162/waldchen22a.html PDF: https://proceedings.mlr.press/v162/waldchen22a/waldchen22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-waldchen22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Stephan family: Wäldchen - given: Sebastian family: Pokutta - given: Felix family: Huber editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22457-22474 id: waldchen22a issued: date-parts: - 2022 - 6 - 28 firstpage: 22457 lastpage: 22474 published: 2022-06-28 00:00:00 +0000 - title: 'Retroformer: Pushing the Limits of End-to-end Retrosynthesis Transformer' abstract: 'Retrosynthesis prediction is one of the fundamental challenges in organic synthesis. The task is to predict the reactants given a core product. With the advancement of machine learning, computer-aided synthesis planning has gained increasing interest. Numerous methods were proposed to solve this problem with different levels of dependency on additional chemical knowledge. In this paper, we propose Retroformer, a novel Transformer-based architecture for retrosynthesis prediction without relying on any cheminformatics tools for molecule editing. Via the proposed local attention head, the model can jointly encode the molecular sequence and graph, and efficiently exchange information between the local reactive region and the global reaction context. Retroformer reaches the new state-of-the-art accuracy for the end-to-end template-free retrosynthesis, and improves over many strong baselines on better molecule and reaction validity. In addition, its generative procedure is highly interpretable and controllable. Overall, Retroformer pushes the limits of the reaction reasoning ability of deep generative models.' volume: 162 URL: https://proceedings.mlr.press/v162/wan22a.html PDF: https://proceedings.mlr.press/v162/wan22a/wan22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wan22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yue family: Wan - given: Chang-Yu family: Hsieh - given: Ben family: Liao - given: Shengyu family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22475-22490 id: wan22a issued: date-parts: - 2022 - 6 - 28 firstpage: 22475 lastpage: 22490 published: 2022-06-28 00:00:00 +0000 - title: 'Safe Exploration for Efficient Policy Evaluation and Comparison' abstract: 'High-quality data plays a central role in ensuring the accuracy of policy evaluation. This paper initiates the study of efficient and safe data collection for bandit policy evaluation. We formulate the problem and investigate its several representative variants. For each variant, we analyze its statistical properties, derive the corresponding exploration policy, and design an efficient algorithm for computing it. Both theoretical analysis and experiments support the usefulness of the proposed methods.' volume: 162 URL: https://proceedings.mlr.press/v162/wan22b.html PDF: https://proceedings.mlr.press/v162/wan22b/wan22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wan22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Runzhe family: Wan - given: Branislav family: Kveton - given: Rui family: Song editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22491-22511 id: wan22b issued: date-parts: - 2022 - 6 - 28 firstpage: 22491 lastpage: 22511 published: 2022-06-28 00:00:00 +0000 - title: 'Greedy based Value Representation for Optimal Coordination in Multi-agent Reinforcement Learning' abstract: 'Due to the representation limitation of the joint Q value function, multi-agent reinforcement learning methods with linear value decomposition (LVD) or monotonic value decomposition (MVD) suffer from relative overgeneralization. As a result, they can not ensure optimal consistency (i.e., the correspondence between individual greedy actions and the best team performance). In this paper, we derive the expression of the joint Q value function of LVD and MVD. According to the expression, we draw a transition diagram, where each self-transition node (STN) is a possible convergence. To ensure the optimal consistency, the optimal node is required to be the unique STN. Therefore, we propose the greedy-based value representation (GVR), which turns the optimal node into an STN via inferior target shaping and eliminates the non-optimal STNs via superior experience replay. Theoretical proofs and empirical results demonstrate that given the true Q values, GVR ensures the optimal consistency under sufficient exploration. Besides, in tasks where the true Q values are unavailable, GVR achieves an adaptive trade-off between optimality and stability. Our method outperforms state-of-the-art baselines in experiments on various benchmarks.' volume: 162 URL: https://proceedings.mlr.press/v162/wan22c.html PDF: https://proceedings.mlr.press/v162/wan22c/wan22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wan22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Lipeng family: Wan - given: Zeyang family: Liu - given: Xingyu family: Chen - given: Xuguang family: Lan - given: Nanning family: Zheng editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22512-22535 id: wan22c issued: date-parts: - 2022 - 6 - 28 firstpage: 22512 lastpage: 22535 published: 2022-06-28 00:00:00 +0000 - title: 'Towards Evaluating Adaptivity of Model-Based Reinforcement Learning Methods' abstract: 'In recent years, a growing number of deep model-based reinforcement learning (RL) methods have been introduced. The interest in deep model-based RL is not surprising, given its many potential benefits, such as higher sample efficiency and the potential for fast adaption to changes in the environment. However, we demonstrate, using an improved version of the recently introduced Local Change Adaptation (LoCA) setup, that well-known model-based methods such as PlaNet and DreamerV2 perform poorly in their ability to adapt to local environmental changes. Combined with prior work that made a similar observation about the other popular model-based method, MuZero, a trend appears to emerge, suggesting that current deep model-based methods have serious limitations. We dive deeper into the causes of this poor performance, by identifying elements that hurt adaptive behavior and linking these to underlying techniques frequently used in deep model-based RL. We empirically validate these insights in the case of linear function approximation by demonstrating that a modified version of linear Dyna achieves effective adaptation to local changes. Furthermore, we provide detailed insights into the challenges of building an adaptive nonlinear model-based method, by experimenting with a nonlinear version of Dyna.' volume: 162 URL: https://proceedings.mlr.press/v162/wan22d.html PDF: https://proceedings.mlr.press/v162/wan22d/wan22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wan22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yi family: Wan - given: Ali family: Rahimi-Kalahroudi - given: Janarthanan family: Rajendran - given: Ida family: Momennejad - given: Sarath family: Chandar - given: Harm H family: Van Seijen editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22536-22561 id: wan22d issued: date-parts: - 2022 - 6 - 28 firstpage: 22536 lastpage: 22561 published: 2022-06-28 00:00:00 +0000 - title: 'Fast Lossless Neural Compression with Integer-Only Discrete Flows' abstract: 'By applying entropy codecs with learned data distributions, neural compressors have significantly outperformed traditional codecs in terms of compression ratio. However, the high inference latency of neural networks hinders the deployment of neural compressors in practical applications. In this work, we propose Integer-only Discrete Flows (IODF) an efficient neural compressor with integer-only arithmetic. Our work is built upon integer discrete flows, which consists of invertible transformations between discrete random variables. We propose efficient invertible transformations with integer-only arithmetic based on 8-bit quantization. Our invertible transformation is equipped with learnable binary gates to remove redundant filters during inference. We deploy IODF with TensorRT on GPUs, achieving $10\times$ inference speedup compared to the fastest existing neural compressors, while retaining the high compression rates on ImageNet32 and ImageNet64.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22a.html PDF: https://proceedings.mlr.press/v162/wang22a/wang22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Siyu family: Wang - given: Jianfei family: Chen - given: Chongxuan family: Li - given: Jun family: Zhu - given: Bo family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22562-22575 id: wang22a issued: date-parts: - 2022 - 6 - 28 firstpage: 22562 lastpage: 22575 published: 2022-06-28 00:00:00 +0000 - title: 'Accelerating Shapley Explanation via Contributive Cooperator Selection' abstract: 'Even though Shapley value provides an effective explanation for a DNN model prediction, the computation relies on the enumeration of all possible input feature coalitions, which leads to the exponentially growing complexity. To address this problem, we propose a novel method SHEAR to significantly accelerate the Shapley explanation for DNN models, where only a few coalitions of input features are involved in the computation. The selection of the feature coalitions follows our proposed Shapley chain rule to minimize the absolute error from the ground-truth Shapley values, such that the computation can be both efficient and accurate. To demonstrate the effectiveness, we comprehensively evaluate SHEAR across multiple metrics including the absolute error from the ground-truth Shapley value, the faithfulness of the explanations, and running speed. The experimental results indicate SHEAR consistently outperforms state-of-the-art baseline methods across different evaluation metrics, which demonstrates its potentials in real-world applications where the computational resource is limited.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22b.html PDF: https://proceedings.mlr.press/v162/wang22b/wang22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Guanchu family: Wang - given: Yu-Neng family: Chuang - given: Mengnan family: Du - given: Fan family: Yang - given: Quan family: Zhou - given: Pushkar family: Tripathi - given: Xuanting family: Cai - given: Xia family: Hu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22576-22590 id: wang22b issued: date-parts: - 2022 - 6 - 28 firstpage: 22576 lastpage: 22590 published: 2022-06-28 00:00:00 +0000 - title: 'Denoised MDPs: Learning World Models Better Than the World Itself' abstract: 'The ability to separate signal from noise, and reason with clean abstractions, is critical to intelligence. With this ability, humans can efficiently perform real world tasks without considering all possible nuisance factors. How can artificial agents do the same? What kind of information can agents safely discard as noises? In this work, we categorize information out in the wild into four types based on controllability and relation with reward, and formulate useful information as that which is both controllable and reward-relevant. This framework clarifies the kinds information removed by various prior work on representation learning in reinforcement learning (RL), and leads to our proposed approach of learning a Denoised MDP that explicitly factors out certain noise distractors. Extensive experiments on variants of DeepMind Control Suite and RoboDesk demonstrate superior performance of our denoised world model over using raw observations alone, and over prior works, across policy optimization control tasks as well as the non-control task of joint position regression. Project Page: https://ssnl.github.io/denoised_mdp/ Code: https://github.com/facebookresearch/denoised_mdp/' volume: 162 URL: https://proceedings.mlr.press/v162/wang22c.html PDF: https://proceedings.mlr.press/v162/wang22c/wang22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tongzhou family: Wang - given: Simon family: Du - given: Antonio family: Torralba - given: Phillip family: Isola - given: Amy family: Zhang - given: Yuandong family: Tian editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22591-22612 id: wang22c issued: date-parts: - 2022 - 6 - 28 firstpage: 22591 lastpage: 22612 published: 2022-06-28 00:00:00 +0000 - title: 'Neural Implicit Dictionary Learning via Mixture-of-Expert Training' abstract: 'Representing visual signals by coordinate-based deep fully-connected networks has been shown advantageous in fitting complex details and solving inverse problems than discrete grid-based representation. However, acquiring such a continuous Implicit Neural Representation (INR) requires tedious per-scene training on tons of signal measurements, which limits its practicality. In this paper, we present a generic INR framework that achieves both data and training efficiency by learning a Neural Implicit Dictionary (NID) from a data collection and representing INR as a functional combination of wavelets sampled from the dictionary. Our NID assembles a group of coordinate-based subnetworks which are tuned to span the desired function space. After training, one can instantly and robustly acquire an unseen scene representation by solving the coding coefficients. To parallelly optimize a large group of networks, we borrow the idea from Mixture-of-Expert (MoE) to design and train our network with a sparse gating mechanism. Our experiments show that, NID can improve reconstruction of 2D images or 3D scenes by 2 orders of magnitude faster with up to 98% less input data. We further demonstrate various applications of NID in image inpainting and occlusion removal, which are considered to be challenging with vanilla INR. Our codes are available in https://github.com/VITA-Group/Neural-Implicit-Dict.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22d.html PDF: https://proceedings.mlr.press/v162/wang22d/wang22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Peihao family: Wang - given: Zhiwen family: Fan - given: Tianlong family: Chen - given: Zhangyang family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22613-22624 id: wang22d issued: date-parts: - 2022 - 6 - 28 firstpage: 22613 lastpage: 22624 published: 2022-06-28 00:00:00 +0000 - title: 'Robust Models Are More Interpretable Because Attributions Look Normal' abstract: 'Recent work has found that adversarially-robust deep networks used for image classification are more interpretable: their feature attributions tend to be sharper, and are more concentrated on the objects associated with the image’s ground- truth class. We show that smooth decision boundaries play an important role in this enhanced interpretability, as the model’s input gradients around data points will more closely align with boundaries’ normal vectors when they are smooth. Thus, because robust models have smoother boundaries, the results of gradient- based attribution methods, like Integrated Gradients and DeepLift, will capture more accurate information about nearby decision boundaries. This understanding of robust interpretability leads to our second contribution: boundary attributions, which aggregate information about the normal vectors of local decision bound- aries to explain a classification outcome. We show that by leveraging the key fac- tors underpinning robust interpretability, boundary attributions produce sharper, more concentrated visual explanations{—}even on non-robust models.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22e.html PDF: https://proceedings.mlr.press/v162/wang22e/wang22e.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zifan family: Wang - given: Matt family: Fredrikson - given: Anupam family: Datta editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22625-22651 id: wang22e issued: date-parts: - 2022 - 6 - 28 firstpage: 22625 lastpage: 22651 published: 2022-06-28 00:00:00 +0000 - title: 'Disentangling Disease-related Representation from Obscure for Disease Prediction' abstract: 'Disease-related representations play a crucial role in image-based disease prediction such as cancer diagnosis, due to its considerable generalization capacity. However, it is still a challenge to identify lesion characteristics in obscured images, as many lesions are obscured by other tissues. In this paper, to learn the representations for identifying obscured lesions, we propose a disentanglement learning strategy under the guidance of alpha blending generation in an encoder-decoder framework (DAB-Net). Specifically, we take mammogram mass benign/malignant classification as an example. In our framework, composite obscured mass images are generated by alpha blending and then explicitly disentangled into disease-related mass features and interference glands features. To achieve disentanglement learning, features of these two parts are decoded to reconstruct the mass and the glands with corresponding reconstruction losses, and only disease-related mass features are fed into the classifier for disease prediction. Experimental results on one public dataset DDSM and three in-house datasets demonstrate that the proposed strategy can achieve state-of-the-art performance. DAB-Net achieves substantial improvements of 3.9%~4.4% AUC in obscured cases. Besides, the visualization analysis shows the model can better disentangle the mass and glands in the obscured image, suggesting the effectiveness of our solution in exploring the hidden characteristics in this challenging problem.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22f.html PDF: https://proceedings.mlr.press/v162/wang22f/wang22f.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22f.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Chu-Ran family: Wang - given: Fei family: Gao - given: Fandong family: Zhang - given: Fangwei family: Zhong - given: Yizhou family: Yu - given: Yizhou family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22652-22664 id: wang22f issued: date-parts: - 2022 - 6 - 28 firstpage: 22652 lastpage: 22664 published: 2022-06-28 00:00:00 +0000 - title: 'Solving Stackelberg Prediction Game with Least Squares Loss via Spherically Constrained Least Squares Reformulation' abstract: 'The Stackelberg prediction game (SPG) is popular in characterizing strategic interactions between a learner and an attacker. As an important special case, the SPG with least squares loss (SPG-LS) has recently received much research attention. Although initially formulated as a difficult bi-level optimization problem, SPG-LS admits tractable reformulations which can be polynomially globally solved by semidefinite programming or second order cone programming. However, all the available approaches are not well-suited for handling large-scale datasets, especially those with huge numbers of features. In this paper, we explore an alternative reformulation of the SPG-LS. By a novel nonlinear change of variables, we rewrite the SPG-LS as a spherically constrained least squares (SCLS) problem. Theoretically, we show that an $\epsilon$ optimal solutions to the SCLS (and the SPG-LS) can be achieved in $\tilde O(N/\sqrt{\epsilon})$ floating-point operations, where $N$ is the number of nonzero entries in the data matrix. Practically, we apply two well-known methods for solving this new reformulation, i.e., the Krylov subspace method and the Riemannian trust region method. Both algorithms are factorization free so that they are suitable for solving large scale problems. Numerical results on both synthetic and real-world datasets indicate that the SPG-LS, equipped with the SCLS reformulation, can be solved orders of magnitude faster than the state of the art.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22g.html PDF: https://proceedings.mlr.press/v162/wang22g/wang22g.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22g.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jiali family: Wang - given: Wen family: Huang - given: Rujun family: Jiang - given: Xudong family: Li - given: Alex L family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22665-22679 id: wang22g issued: date-parts: - 2022 - 6 - 28 firstpage: 22665 lastpage: 22679 published: 2022-06-28 00:00:00 +0000 - title: 'VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix' abstract: 'Existing vision-language pre-training (VLP) methods primarily rely on paired image-text datasets, which are either annotated by enormous human labors or crawled from the internet followed by elaborate data cleaning techniques. To reduce the dependency on well-aligned image-text pairs, it is promising to directly leverage the large-scale text-only and image-only corpora. This paper proposes a data augmentation method, namely cross-modal CutMix (CMC), for implicit cross-modal alignment learning in unpaired VLP. Specifically, CMC transforms natural sentences in the textual view into a multi-modal view, where visually-grounded words in a sentence are randomly replaced by diverse image patches with similar semantics. There are several appealing proprieties of the proposed CMC. First, it enhances the data diversity while keeping the semantic meaning intact for tackling problems where the aligned data are scarce; Second, by attaching cross-modal noise on uni-modal data, it guides models to learn token-level interactions across modalities for better denoising. Furthermore, we present a new unpaired VLP method, dubbed as VLMixer, that integrates CMC with contrastive learning to pull together the uni-modal and multi-modal views for better instance-level alignments among different modalities. Extensive experiments on five downstream tasks show that VLMixer could surpass previous state-of-the-art unpaired VLP methods.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22h.html PDF: https://proceedings.mlr.press/v162/wang22h/wang22h.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22h.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Teng family: Wang - given: Wenhao family: Jiang - given: Zhichao family: Lu - given: Feng family: Zheng - given: Ran family: Cheng - given: Chengguo family: Yin - given: Ping family: Luo editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22680-22690 id: wang22h issued: date-parts: - 2022 - 6 - 28 firstpage: 22680 lastpage: 22690 published: 2022-06-28 00:00:00 +0000 - title: 'DynaMixer: A Vision MLP Architecture with Dynamic Mixing' abstract: 'Recently, MLP-like vision models have achieved promising performances on mainstream visual recognition tasks. In contrast with vision transformers and CNNs, the success of MLP-like models shows that simple information fusion operations among tokens and channels can yield a good representation power for deep recognition models. However, existing MLP-like models fuse tokens through static fusion operations, lacking adaptability to the contents of the tokens to be mixed. Thus, customary information fusion procedures are not effective enough. To this end, this paper presents an efficient MLP-like network architecture, dubbed DynaMixer, resorting to dynamic information fusion. Critically, we propose a procedure, on which the DynaMixer model relies, to dynamically generate mixing matrices by leveraging the contents of all the tokens to be mixed. To reduce the time complexity and improve the robustness, a dimensionality reduction technique and a multi-segment fusion mechanism are adopted. Our proposed DynaMixer model (97M parameters) achieves 84.3% top-1 accuracy on the ImageNet-1K dataset without extra training data, performing favorably against the state-of-the-art vision MLP models. When the number of parameters is reduced to 26M, it still achieves 82.7% top-1 accuracy, surpassing the existing MLP-like models with a similar capacity. The code is available at \url{https://github.com/ziyuwwang/DynaMixer}.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22i.html PDF: https://proceedings.mlr.press/v162/wang22i/wang22i.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22i.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ziyu family: Wang - given: Wenhao family: Jiang - given: Yiming M family: Zhu - given: Li family: Yuan - given: Yibing family: Song - given: Wei family: Liu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22691-22701 id: wang22i issued: date-parts: - 2022 - 6 - 28 firstpage: 22691 lastpage: 22701 published: 2022-06-28 00:00:00 +0000 - title: 'Improving Screening Processes via Calibrated Subset Selection' abstract: 'Many selection processes such as finding patients qualifying for a medical trial or retrieval pipelines in search engines consist of multiple stages, where an initial screening stage focuses the resources on shortlisting the most promising candidates. In this paper, we investigate what guarantees a screening classifier can provide, independently of whether it is constructed manually or trained. We find that current solutions do not enjoy distribution-free theoretical guarantees and we show that, in general, even for a perfectly calibrated classifier, there always exist specific pools of candidates for which its shortlist is suboptimal. Then, we develop a distribution-free screening algorithm—called Calibrated Subsect Selection (CSS)—that, given any classifier and some amount of calibration data, finds near-optimal shortlists of candidates that contain a desired number of qualified candidates in expectation. Moreover, we show that a variant of CSS that calibrates a given classifier multiple times across specific groups can create shortlists with provable diversity guarantees. Experiments on US Census survey data validate our theoretical results and show that the shortlists provided by our algorithm are superior to those provided by several competitive baselines.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22j.html PDF: https://proceedings.mlr.press/v162/wang22j/wang22j.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22j.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Lequn family: Wang - given: Thorsten family: Joachims - given: Manuel Gomez family: Rodriguez editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22702-22726 id: wang22j issued: date-parts: - 2022 - 6 - 28 firstpage: 22702 lastpage: 22726 published: 2022-06-28 00:00:00 +0000 - title: 'The Geometry of Robust Value Functions' abstract: 'The space of value functions is a fundamental concept in reinforcement learning. Characterizing its geometric properties may provide insights for optimization and representation. Existing works mainly focus on the value space for Markov Decision Processes (MDPs). In this paper, we study the geometry of the robust value space for the more general Robust MDPs (RMDPs) setting, where transition uncertainties are considered. Specifically, since we find it hard to directly adapt prior approaches to RMDPs, we start with revisiting the non-robust case, and introduce a new perspective that enables us to characterize both the non-robust and robust value space in a similar fashion. The key of this perspective is to decompose the value space, in a state-wise manner, into unions of hypersurfaces. Through our analysis, we show that the robust value space is determined by a set of conic hypersurfaces, each of which contains the robust values of all policies that agree on one state. Furthermore, we find that taking only extreme points in the uncertainty set is sufficient to determine the robust value space. Finally, we discuss some other aspects about the robust value space, including its non-convexity and policy agreement on multiple states.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22k.html PDF: https://proceedings.mlr.press/v162/wang22k/wang22k.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22k.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Kaixin family: Wang - given: Navdeep family: Kumar - given: Kuangqi family: Zhou - given: Bryan family: Hooi - given: Jiashi family: Feng - given: Shie family: Mannor editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22727-22751 id: wang22k issued: date-parts: - 2022 - 6 - 28 firstpage: 22727 lastpage: 22751 published: 2022-06-28 00:00:00 +0000 - title: 'What Dense Graph Do You Need for Self-Attention?' abstract: 'Transformers have made progress in miscellaneous tasks, but suffer from quadratic computational and memory complexities. Recent works propose sparse transformers with attention on sparse graphs to reduce complexity and remain strong performance. While effective, the crucial parts of how dense a graph needs to be to perform well are not fully explored. In this paper, we propose Normalized Information Payload (NIP), a graph scoring function measuring information transfer on graph, which provides an analysis tool for trade-offs between performance and complexity. Guided by this theoretical analysis, we present Hypercube Transformer, a sparse transformer that models token interactions in a hypercube and shows comparable or even better results with vanilla transformer while yielding $O(N\log N)$ complexity with sequence length $N$. Experiments on tasks requiring various sequence lengths lay validation for our graph function well.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22l.html PDF: https://proceedings.mlr.press/v162/wang22l/wang22l.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22l.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yuxin family: Wang - given: Chu-Tak family: Lee - given: Qipeng family: Guo - given: Zhangyue family: Yin - given: Yunhua family: Zhou - given: Xuanjing family: Huang - given: Xipeng family: Qiu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22752-22768 id: wang22l issued: date-parts: - 2022 - 6 - 28 firstpage: 22752 lastpage: 22768 published: 2022-06-28 00:00:00 +0000 - title: 'Improved Certified Defenses against Data Poisoning with (Deterministic) Finite Aggregation' abstract: 'Data poisoning attacks aim at manipulating model behaviors through distorting training data. Previously, an aggregation-based certified defense, Deep Partition Aggregation (DPA), was proposed to mitigate this threat. DPA predicts through an aggregation of base classifiers trained on disjoint subsets of data, thus restricting its sensitivity to dataset distortions. In this work, we propose an improved certified defense against general poisoning attacks, namely Finite Aggregation. In contrast to DPA, which directly splits the training set into disjoint subsets, our method first splits the training set into smaller disjoint subsets and then combines duplicates of them to build larger (but not disjoint) subsets for training base classifiers. This reduces the worst-case impacts of poison samples and thus improves certified robustness bounds. In addition, we offer an alternative view of our method, bridging the designs of deterministic and stochastic aggregation-based certified defenses. Empirically, our proposed Finite Aggregation consistently improves certificates on MNIST, CIFAR-10, and GTSRB, boosting certified fractions by up to 3.05%, 3.87% and 4.77%, respectively, while keeping the same clean accuracies as DPA’s, effectively establishing a new state of the art in (pointwise) certified robustness against data poisoning.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22m.html PDF: https://proceedings.mlr.press/v162/wang22m/wang22m.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22m.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Wenxiao family: Wang - given: Alexander J family: Levine - given: Soheil family: Feizi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22769-22783 id: wang22m issued: date-parts: - 2022 - 6 - 28 firstpage: 22769 lastpage: 22783 published: 2022-06-28 00:00:00 +0000 - title: 'Understanding Gradual Domain Adaptation: Improved Analysis, Optimal Path and Beyond' abstract: 'The vast majority of existing algorithms for unsupervised domain adaptation (UDA) focus on adapting from a labeled source domain to an unlabeled target domain directly in a one-off way. Gradual domain adaptation (GDA), on the other hand, assumes a path of $(T-1)$ unlabeled intermediate domains bridging the source and target, and aims to provide better generalization in the target domain by leveraging the intermediate ones. Under certain assumptions, Kumar et al. (2020) proposed a simple algorithm, Gradual Self-Training, along with a generalization bound in the order of $e^{O(T)} \left(\varepsilon_0+O\left(\sqrt{log(T)/n}\right)\right)$ for the target domain error, where $\varepsilon_0$ is the source domain error and $n$ is the data size of each domain. Due to the exponential factor, this upper bound becomes vacuous when $T$ is only moderately large. In this work, we analyze gradual self-training under more general and relaxed assumptions, and prove a significantly improved generalization bound as $\widetilde{O}\left(\varepsilon_0 + T\Delta + T/\sqrt{n} + 1/\sqrt{nT}\right)$, where $\Delta$ is the average distributional distance between consecutive domains. Compared with the existing bound with an exponential dependency on $T$ as a multiplicative factor, our bound only depends on $T$ linearly and additively. Perhaps more interestingly, our result implies the existence of an optimal choice of $T$ that minimizes the generalization error, and it also naturally suggests an optimal way to construct the path of intermediate domains so as to minimize the accumulative path length $T\Delta$ between the source and target. To corroborate the implications of our theory, we examine gradual self-training on multiple semi-synthetic and real datasets, which confirms our findings. We believe our insights provide a path forward toward the design of future GDA algorithms.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22n.html PDF: https://proceedings.mlr.press/v162/wang22n/wang22n.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22n.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Haoxiang family: Wang - given: Bo family: Li - given: Han family: Zhao editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22784-22801 id: wang22n issued: date-parts: - 2022 - 6 - 28 firstpage: 22784 lastpage: 22801 published: 2022-06-28 00:00:00 +0000 - title: 'Communication-Efficient Adaptive Federated Learning' abstract: 'Federated learning is a machine learning training paradigm that enables clients to jointly train models without sharing their own localized data. However, the implementation of federated learning in practice still faces numerous challenges, such as the large communication overhead due to the repetitive server-client synchronization and the lack of adaptivity by SGD-based model updates. Despite that various methods have been proposed for reducing the communication cost by gradient compression or quantization, and the federated versions of adaptive optimizers such as FedAdam are proposed to add more adaptivity, the current federated learning framework still cannot solve the aforementioned challenges all at once. In this paper, we propose a novel communication-efficient adaptive federated learning method (FedCAMS) with theoretical convergence guarantees. We show that in the nonconvex stochastic optimization setting, our proposed FedCAMS achieves the same convergence rate of $O(\frac{1}{\sqrt{TKm}})$ as its non-compressed counterparts. Extensive experiments on various benchmarks verify our theoretical analysis.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22o.html PDF: https://proceedings.mlr.press/v162/wang22o/wang22o.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22o.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yujia family: Wang - given: Lu family: Lin - given: Jinghui family: Chen editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22802-22838 id: wang22o issued: date-parts: - 2022 - 6 - 28 firstpage: 22802 lastpage: 22838 published: 2022-06-28 00:00:00 +0000 - title: 'Provable Acceleration of Heavy Ball beyond Quadratics for a Class of Polyak-Lojasiewicz Functions when the Non-Convexity is Averaged-Out' abstract: 'Heavy Ball (HB) nowadays is one of the most popular momentum methods in non-convex optimization. It has been widely observed that incorporating the Heavy Ball dynamic in gradient-based methods accelerates the training process of modern machine learning models. However, the progress on establishing its theoretical foundation of acceleration is apparently far behind its empirical success. Existing provable acceleration results are of the quadratic or close-to-quadratic functions, as the current techniques of showing HB’s acceleration are limited to the case when the Hessian is fixed. In this work, we develop some new techniques that help show acceleration beyond quadratics, which is achieved by analyzing how the change of the Hessian at two consecutive time points affects the convergence speed. Based on our technical results, a class of Polyak-Lojasiewicz (PL) optimization problems for which provable acceleration can be achieved via HB is identified. Moreover, our analysis demonstrates a benefit of adaptively setting the momentum parameter.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22p.html PDF: https://proceedings.mlr.press/v162/wang22p/wang22p.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22p.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jun-Kun family: Wang - given: Chi-Heng family: Lin - given: Andre family: Wibisono - given: Bin family: Hu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22839-22864 id: wang22p issued: date-parts: - 2022 - 6 - 28 firstpage: 22839 lastpage: 22864 published: 2022-06-28 00:00:00 +0000 - title: 'Robustness Verification for Contrastive Learning' abstract: 'Contrastive adversarial training has successfully improved the robustness of contrastive learning (CL). However, the robustness metric used in these methods is linked to attack algorithms, image labels and downstream tasks, all of which may affect the consistency and reliability of robustness metric for CL. To address these problems, this paper proposes a novel Robustness Verification framework for Contrastive Learning (RVCL). Furthermore, we use extreme value theory to reveal the relationship between the robust radius of the CL encoder and that of the supervised downstream task. Extensive experimental results on various benchmark models and datasets verify our theoretical findings, and further demonstrate that our proposed RVCL is able to evaluate the robustness of both models and images. Our code is available at https://github.com/wzekai99/RVCL.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22q.html PDF: https://proceedings.mlr.press/v162/wang22q/wang22q.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22q.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zekai family: Wang - given: Weiwei family: Liu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22865-22883 id: wang22q issued: date-parts: - 2022 - 6 - 28 firstpage: 22865 lastpage: 22883 published: 2022-06-28 00:00:00 +0000 - title: 'Convergence and Recovery Guarantees of the K-Subspaces Method for Subspace Clustering' abstract: 'The K-subspaces (KSS) method is a generalization of the K-means method for subspace clustering. In this work, we present local convergence analysis and a recovery guarantee for KSS, assuming data are generated by the semi-random union of subspaces model, where $N$ points are randomly sampled from $K \ge 2$ overlapping subspaces. We show that if the initial assignment of the KSS method lies within a neighborhood of a true clustering, it converges at a superlinear rate and finds the correct clustering within $\Theta(\log\log N)$ iterations with high probability. Moreover, we propose a thresholding inner-product based spectral method for initialization and prove that it produces a point in this neighborhood. We also present numerical results of the studied method to support our theoretical developments.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22r.html PDF: https://proceedings.mlr.press/v162/wang22r/wang22r.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22r.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Peng family: Wang - given: Huikang family: Liu - given: Anthony Man-Cho family: So - given: Laura family: Balzano editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22884-22918 id: wang22r issued: date-parts: - 2022 - 6 - 28 firstpage: 22884 lastpage: 22918 published: 2022-06-28 00:00:00 +0000 - title: 'NP-Match: When Neural Processes meet Semi-Supervised Learning' abstract: 'Semi-supervised learning (SSL) has been widely explored in recent years, and it is an effective way of leveraging unlabeled data to reduce the reliance on labeled data. In this work, we adjust neural processes (NPs) to the semi-supervised image classification task, resulting in a new method named NP-Match. NP-Match is suited to this task for two reasons. Firstly, NP-Match implicitly compares data points when making predictions, and as a result, the prediction of each unlabeled data point is affected by the labeled data points that are similar to it, which improves the quality of pseudolabels. Secondly, NP-Match is able to estimate uncertainty that can be used as a tool for selecting unlabeled samples with reliable pseudo-labels. Compared with uncertainty-based SSL methods implemented with Monte Carlo (MC) dropout, NP-Match estimates uncertainty with much less computational overhead, which can save time at both the training and the testing phases. We conducted extensive experiments on four public datasets, and NP-Match outperforms state-of-theart (SOTA) results or achieves competitive results on them, which shows the effectiveness of NPMatch and its potential for SSL.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22s.html PDF: https://proceedings.mlr.press/v162/wang22s/wang22s.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22s.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jianfeng family: Wang - given: Thomas family: Lukasiewicz - given: Daniela family: Massiceti - given: Xiaolin family: Hu - given: Vladimir family: Pavlovic - given: Alexandros family: Neophytou editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22919-22934 id: wang22s issued: date-parts: - 2022 - 6 - 28 firstpage: 22919 lastpage: 22934 published: 2022-06-28 00:00:00 +0000 - title: 'Iterative Double Sketching for Faster Least-Squares Optimization' abstract: 'This work is concerned with the overdetermined linear least-squares problem for large scale data. We generalize the iterative Hessian sketching (IHS) algorithm and propose a new sketching framework named iterative double sketching (IDS) which uses approximations for both the gradient and the Hessian in each iteration. To understand the behavior of the IDS algorithm and choose the optimal hyperparameters, we derive the exact limit of the conditional prediction error of the IDS algorithm in the setting of Gaussian sketching. Guided by this theoretical result, we propose an efficient IDS algorithm via a new class of sequentially related sketching matrices. We give a non-asymptotic analysis of this efficient IDS algorithm which shows that the proposed algorithm achieves the state-of-the-art trade-off between accuracy and efficiency.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22t.html PDF: https://proceedings.mlr.press/v162/wang22t/wang22t.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22t.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Rui family: Wang - given: Yanyan family: Ouyang - given: Wangli family: Xu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22935-22963 id: wang22t issued: date-parts: - 2022 - 6 - 28 firstpage: 22935 lastpage: 22963 published: 2022-06-28 00:00:00 +0000 - title: 'What Language Model Architecture and Pretraining Objective Works Best for Zero-Shot Generalization?' abstract: 'Large pretrained Transformer language models have been shown to exhibit zero-shot generalization, i.e. they can perform a wide variety of tasks that they were not explicitly trained on. However, the architectures and pretraining objectives used across state-of-the-art models differ significantly, and there has been limited systematic comparison of these factors. In this work, we present a large-scale evaluation of modeling choices and their impact on zero-shot generalization. In particular, we focus on text-to-text models and experiment with three model architectures (causal/non-causal decoder-only and encoder-decoder), trained with two different pretraining objectives (autoregressive and masked language modeling), and evaluated with and without multitask prompted finetuning. We train models with over 5 billion parameters for more than 168 billion tokens, thereby increasing the likelihood that our conclusions will transfer to even larger scales. Our experiments show that causal decoder-only models trained on an autoregressive language modeling objective exhibit the strongest zero-shot generalization after purely self-supervised pretraining. However, models with non-causal visibility on their input trained with a masked language modeling objective followed by multitask finetuning perform the best among our experiments. We therefore consider the adaptation of pretrained models across architectures and objectives. Code and checkpoints are available at https://github.com/bigscience- workshop/architecture-objective.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22u.html PDF: https://proceedings.mlr.press/v162/wang22u/wang22u.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22u.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Thomas family: Wang - given: Adam family: Roberts - given: Daniel family: Hesslow - given: Teven Le family: Scao - given: Hyung Won family: Chung - given: Iz family: Beltagy - given: Julien family: Launay - given: Colin family: Raffel editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22964-22984 id: wang22u issued: date-parts: - 2022 - 6 - 28 firstpage: 22964 lastpage: 22984 published: 2022-06-28 00:00:00 +0000 - title: 'Improving Task-free Continual Learning by Distributionally Robust Memory Evolution' abstract: 'Task-free continual learning (CL) aims to learn a non-stationary data stream without explicit task definitions and not forget previous knowledge. The widely adopted memory replay approach could gradually become less effective for long data streams, as the model may memorize the stored examples and overfit the memory buffer. Second, existing methods overlook the high uncertainty in the memory data distribution since there is a big gap between the memory data distribution and the distribution of all the previous data examples. To address these problems, for the first time, we propose a principled memory evolution framework to dynamically evolve the memory data distribution by making the memory buffer gradually harder to be memorized with distributionally robust optimization (DRO). We then derive a family of methods to evolve the memory buffer data in the continuous probability measure space with Wasserstein gradient flow (WGF). The proposed DRO is w.r.t the worst-case evolved memory data distribution, thus guarantees the model performance and learns significantly more robust features than existing memory-replay-based methods. Extensive experiments on existing benchmarks demonstrate the effectiveness of the proposed methods for alleviating forgetting. As a by-product of the proposed framework, our method is more robust to adversarial examples than existing task-free CL methods.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22v.html PDF: https://proceedings.mlr.press/v162/wang22v/wang22v.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22v.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhenyi family: Wang - given: Li family: Shen - given: Le family: Fang - given: Qiuling family: Suo - given: Tiehang family: Duan - given: Mingchen family: Gao editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22985-22998 id: wang22v issued: date-parts: - 2022 - 6 - 28 firstpage: 22985 lastpage: 22998 published: 2022-06-28 00:00:00 +0000 - title: 'Risk-Averse No-Regret Learning in Online Convex Games' abstract: 'We consider an online stochastic game with risk-averse agents whose goal is to learn optimal decisions that minimize the risk of incurring significantly high costs. Specifically, we use the Conditional Value at Risk (CVaR) as a risk measure that the agents can estimate using bandit feedback in the form of the cost values of only their selected actions. Since the distributions of the cost functions depend on the actions of all agents that are generally unobservable, they are themselves unknown and, therefore, the CVaR values of the costs are difficult to compute. To address this challenge, we propose a new online risk-averse learning algorithm that relies on one-point zeroth-order estimation of the CVaR gradients computed using CVaR values that are estimated by appropriately sampling the cost functions. We show that this algorithm achieves sub-linear regret with high probability. We also propose two variants of this algorithm that improve performance. The first variant relies on a new sampling strategy that uses samples from the previous iteration to improve the estimation accuracy of the CVaR values. The second variant employs residual feedback that uses CVaR values from the previous iteration to reduce the variance of the CVaR gradient estimates. We theoretically analyze the convergence properties of these variants and illustrate their performance on an online market problem that we model as a Cournot game.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22w.html PDF: https://proceedings.mlr.press/v162/wang22w/wang22w.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22w.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zifan family: Wang - given: Yi family: Shen - given: Michael family: Zavlanos editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 22999-23017 id: wang22w issued: date-parts: - 2022 - 6 - 28 firstpage: 22999 lastpage: 23017 published: 2022-06-28 00:00:00 +0000 - title: 'Provable Domain Generalization via Invariant-Feature Subspace Recovery' abstract: 'Domain generalization asks for models trained over a set of training environments to perform well in unseen test environments. Recently, a series of algorithms such as Invariant Risk Minimization (IRM) has been proposed for domain generalization. However, Rosenfeld et al. (2021) shows that in a simple linear data model, even if non-convexity issues are ignored, IRM and its extensions cannot generalize to unseen environments with less than $d_s+1$ training environments, where $d_s$ is the dimension of the spurious-feature subspace. In this paper, we propose to achieve domain generalization with Invariant-feature Subspace Recovery (ISR). Our first algorithm, ISR-Mean, can identify the subspace spanned by invariant features from the first-order moments of the class-conditional distributions, and achieve provable domain generalization with $d_s+1$ training environments under the data model of Rosenfeld et al. (2021). Our second algorithm, ISR-Cov, further reduces the required number of training environments to $O(1)$ using the information of second-order moments. Notably, unlike IRM, our algorithms bypass non-convexity issues and enjoy global convergence guarantees. Empirically, our ISRs can obtain superior performance compared with IRM on synthetic benchmarks. In addition, on three real-world image and text datasets, we show that both ISRs can be used as simple yet effective post-processing methods to improve the worst-case accuracy of (pre-)trained models against spurious correlations and group shifts.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22x.html PDF: https://proceedings.mlr.press/v162/wang22x/wang22x.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22x.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Haoxiang family: Wang - given: Haozhe family: Si - given: Bo family: Li - given: Han family: Zhao editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23018-23033 id: wang22x issued: date-parts: - 2022 - 6 - 28 firstpage: 23018 lastpage: 23033 published: 2022-06-28 00:00:00 +0000 - title: 'ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive Training' abstract: 'Federated learning is a powerful distributed learning scheme that allows numerous edge devices to collaboratively train a model without sharing their data. However, training is resource-intensive for edge devices, and limited network bandwidth is often the main bottleneck. Prior work often overcomes the constraints by condensing the models or messages into compact formats, e.g., by gradient compression or distillation. In contrast, we propose ProgFed, the first progressive training framework for efficient and effective federated learning. It inherently reduces computation and two-way communication costs while maintaining the strong performance of the final models. We theoretically prove that ProgFed converges at the same asymptotic rate as standard training on full models. Extensive results on a broad range of architectures, including CNNs (VGG, ResNet, ConvNets) and U-nets, and diverse tasks from simple classification to medical image segmentation show that our highly effective training approach saves up to $20%$ computation and up to $63%$ communication costs for converged models. As our approach is also complimentary to prior work on compression, we can achieve a wide range of trade-offs by combining these techniques, showing reduced communication of up to $50\times$ at only $0.1%$ loss in utility. Code is available at https://github.com/a514514772/ProgFed.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22y.html PDF: https://proceedings.mlr.press/v162/wang22y/wang22y.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22y.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hui-Po family: Wang - given: Sebastian family: Stich - given: Yang family: He - given: Mario family: Fritz editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23034-23054 id: wang22y issued: date-parts: - 2022 - 6 - 28 firstpage: 23034 lastpage: 23054 published: 2022-06-28 00:00:00 +0000 - title: 'Model-based Meta Reinforcement Learning using Graph Structured Surrogate Models and Amortized Policy Search' abstract: 'Reinforcement learning is a promising paradigm for solving sequential decision-making problems, but low data efficiency and weak generalization across tasks are bottlenecks in real-world applications. Model-based meta reinforcement learning addresses these issues by learning dynamics and leveraging knowledge from prior experience. In this paper, we take a closer look at this framework and propose a new posterior sampling based approach that consists of a new model to identify task dynamics together with an amortized policy optimization step. We show that our model, called a graph structured surrogate model (GSSM), achieves competitive dynamics prediction performance with lower model complexity. Moreover, our approach in policy search is able to obtain high returns and allows fast execution by avoiding test-time policy gradient updates.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22z.html PDF: https://proceedings.mlr.press/v162/wang22z/wang22z.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22z.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Qi family: Wang - given: Herke family: Van Hoof editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23055-23077 id: wang22z issued: date-parts: - 2022 - 6 - 28 firstpage: 23055 lastpage: 23077 published: 2022-06-28 00:00:00 +0000 - title: 'Approximately Equivariant Networks for Imperfectly Symmetric Dynamics' abstract: 'Incorporating symmetry as an inductive bias into neural network architecture has led to improvements in generalization, data efficiency, and physical consistency in dynamics modeling. Methods such as CNNs or equivariant neural networks use weight tying to enforce symmetries such as shift invariance or rotational equivariance. However, despite the fact that physical laws obey many symmetries, real-world dynamical data rarely conforms to strict mathematical symmetry either due to noisy or incomplete data or to symmetry breaking features in the underlying dynamical system. We explore approximately equivariant networks which are biased towards preserving symmetry but are not strictly constrained to do so. By relaxing equivariance constraints, we find that our models can outperform both baselines with no symmetry bias and baselines with overly strict symmetry in both simulated turbulence domains and real-world multi-stream jet flow.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22aa.html PDF: https://proceedings.mlr.press/v162/wang22aa/wang22aa.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22aa.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Rui family: Wang - given: Robin family: Walters - given: Rose family: Yu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23078-23091 id: wang22aa issued: date-parts: - 2022 - 6 - 28 firstpage: 23078 lastpage: 23091 published: 2022-06-28 00:00:00 +0000 - title: 'Three-stage Evolution and Fast Equilibrium for SGD with Non-degerate Critical Points' abstract: 'We justify the fast equilibrium conjecture on stochastic gradient descent from (Li et al. 2020) under the assumptions that critical points are non-degenerate and the stochastic noise is a standard Gaussian. In this case, we prove an SGD with constant effective learning rate consists of three stages: descent, diffusion and tunneling, and explicitly identify temporary equilibrium states in the normalized parameter space that can be observed within practical training time. This interprets the gap between the mixing time in the fast equilibrium conjecture and the previously known upper bound. While our assumptions do not represent typical implementations of SGD of neural networks in practice, this is the first description of the three-stage mechanism in any case. The main finding in this mechanism is that a temporary equilibrium of local nature is quickly achieved after polynomial time (in term of the reciprocal of the intrinsic learning rate) and then stabilizes within observable time scales; and that the temporary equilibrium is in general different from the global Gibbs equilibrium, which will only appear after an exponentially long period beyond typical training limits. Our experiments support that this mechanism may extend to the general case.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22ab.html PDF: https://proceedings.mlr.press/v162/wang22ab/wang22ab.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22ab.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yi family: Wang - given: Zhiren family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23092-23113 id: wang22ab issued: date-parts: - 2022 - 6 - 28 firstpage: 23092 lastpage: 23113 published: 2022-06-28 00:00:00 +0000 - title: 'Understanding Instance-Level Impact of Fairness Constraints' abstract: 'A variety of fairness constraints have been proposed in the literature to mitigate group-level statistical bias. Their impacts have been largely evaluated for different groups of populations corresponding to a set of sensitive attributes, such as race or gender. Nonetheless, the community has not observed sufficient explorations for how imposing fairness constraints fare at an instance level. Building on the concept of influence function, a measure that characterizes the impact of a training example on the target model and its predictive performance, this work studies the influence of training examples when fairness constraints are imposed. We find out that under certain assumptions, the influence function with respect to fairness constraints can be decomposed into a kernelized combination of training examples. One promising application of the proposed fairness influence function is to identify suspicious training examples that may cause model discrimination by ranking their influence scores. We demonstrate with extensive experiments that training on a subset of weighty data examples leads to lower fairness violations with a trade-off of accuracy.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22ac.html PDF: https://proceedings.mlr.press/v162/wang22ac/wang22ac.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22ac.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jialu family: Wang - given: Xin Eric family: Wang - given: Yang family: Liu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23114-23130 id: wang22ac issued: date-parts: - 2022 - 6 - 28 firstpage: 23114 lastpage: 23130 published: 2022-06-28 00:00:00 +0000 - title: 'Tractable Uncertainty for Structure Learning' abstract: 'Bayesian structure learning allows one to capture uncertainty over the causal directed acyclic graph (DAG) responsible for generating given data. In this work, we present Tractable Uncertainty for STructure learning (TRUST), a framework for approximate posterior inference that relies on probabilistic circuits as a representation of our posterior belief. In contrast to sample-based posterior approximations, our representation can capture a much richer space of DAGs, while being able to tractably answer a range of useful inference queries. We empirically demonstrate how probabilistic circuits can be used to as an augmented representation for structure learning methods, leading to improvement in both the quality of inferred structures and posterior uncertainty. Experimental results also demonstrate the improved representational capacity of TRUST, outperforming competing methods on conditional query answering.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22ad.html PDF: https://proceedings.mlr.press/v162/wang22ad/wang22ad.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22ad.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Benjie family: Wang - given: Matthew R family: Wicker - given: Marta family: Kwiatkowska editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23131-23150 id: wang22ad issued: date-parts: - 2022 - 6 - 28 firstpage: 23131 lastpage: 23150 published: 2022-06-28 00:00:00 +0000 - title: 'Causal Dynamics Learning for Task-Independent State Abstraction' abstract: 'Learning dynamics models accurately is an important goal for Model-Based Reinforcement Learning (MBRL), but most MBRL methods learn a dense dynamics model which is vulnerable to spurious correlations and therefore generalizes poorly to unseen states. In this paper, we introduce Causal Dynamics Learning for Task-Independent State Abstraction (CDL), which first learns a theoretically proved causal dynamics model that removes unnecessary dependencies between state variables and the action, thus generalizing well to unseen states. A state abstraction can then be derived from the learned dynamics, which not only improves sample efficiency but also applies to a wider range of tasks than existing state abstraction methods. Evaluated on two simulated environments and downstream tasks, both the dynamics model and policies learned by the proposed method generalize well to unseen states and the derived state abstraction improves sample efficiency compared to learning without it.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22ae.html PDF: https://proceedings.mlr.press/v162/wang22ae/wang22ae.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22ae.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zizhao family: Wang - given: Xuesu family: Xiao - given: Zifan family: Xu - given: Yuke family: Zhu - given: Peter family: Stone editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23151-23180 id: wang22ae issued: date-parts: - 2022 - 6 - 28 firstpage: 23151 lastpage: 23180 published: 2022-06-28 00:00:00 +0000 - title: 'Multiple-Play Stochastic Bandits with Shareable Finite-Capacity Arms' abstract: 'We generalize the multiple-play multi-armed bandits (MP-MAB) problem with a shareable arms setting, in which several plays can share the same arm. Furthermore, each shareable arm has a finite reward capacity and a “per-load” reward distribution, both of which are unknown to the learner. The reward from a shareable arm is load-dependent, which is the “per-load” reward multiplying either the number of plays pulling the arm, or its reward capacity when the number of plays exceeds the capacity limit. When the “per-load” reward follows a Gaussian distribution, we prove a sample complexity lower bound of learning the capacity from load-dependent rewards and also a regret lower bound of this new MP-MAB problem. We devise a capacity estimator whose sample complexity upper bound matches the lower bound in terms of reward means and capacities. We also propose an online learning algorithm to address the problem and prove its regret upper bound. This regret upper bound’s first term is the same as regret lower bound’s, and its second and third terms also evidently correspond to lower bound’s. Extensive experiments validate our algorithm’s performance and also its gain in 5G & 4G base station selection.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22af.html PDF: https://proceedings.mlr.press/v162/wang22af/wang22af.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22af.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xuchuang family: Wang - given: Hong family: Xie - given: John C. S. family: Lui editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23181-23212 id: wang22af issued: date-parts: - 2022 - 6 - 28 firstpage: 23181 lastpage: 23212 published: 2022-06-28 00:00:00 +0000 - title: 'Generative Coarse-Graining of Molecular Conformations' abstract: 'Coarse-graining (CG) of molecular simulations simplifies the particle representation by grouping selected atoms into pseudo-beads and therefore drastically accelerates simulation. However, such CG procedure induces information losses, which makes accurate backmapping, i.e., restoring fine-grained (FG) coordinates from CG coordinates, a long-standing challenge. Inspired by the recent progress in generative models and equivariant networks, we propose a novel model that rigorously embeds the vital probabilistic nature and geometrical consistency requirements of the backmapping transformation. Our model encodes the FG uncertainties into an invariant latent space and decodes them back to FG geometries via equivariant convolutions. To standardize the evaluation of this domain, we further provide three comprehensive benchmarks based on molecular dynamics trajectories. Extensive experiments show that our approach always recovers more realistic structures and outperforms existing data-driven methods with a significant margin.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22ag.html PDF: https://proceedings.mlr.press/v162/wang22ag/wang22ag.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22ag.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Wujie family: Wang - given: Minkai family: Xu - given: Chen family: Cai - given: Benjamin K family: Miller - given: Tess family: Smidt - given: Yusu family: Wang - given: Jian family: Tang - given: Rafael family: Gomez-Bombarelli editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23213-23236 id: wang22ag issued: date-parts: - 2022 - 6 - 28 firstpage: 23213 lastpage: 23236 published: 2022-06-28 00:00:00 +0000 - title: 'Nonparametric Embeddings of Sparse High-Order Interaction Events' abstract: 'High-order interaction events are common in real-world applications. Learning embeddings that encode the complex relationships of the participants from these events is of great importance in knowledge mining and predictive tasks. Despite the success of existing approaches, e.g. Poisson tensor factorization, they ignore the sparse structure underlying the data, namely the occurred interactions are far less than the possible interactions among all the participants. In this paper, we propose Nonparametric Embeddings of Sparse High-order interaction events (NESH). We hybridize a sparse hypergraph (tensor) process and a matrix Gaussian process to capture both the asymptotic structural sparsity within the interactions and nonlinear temporal relationships between the participants. We prove strong asymptotic bounds (including both a lower and an upper bound ) of the sparse ratio, which reveals the asymptotic properties of the sampled structure. We use batch-normalization, stick-breaking construction and sparse variational GP approximations to develop an efficient, scalable model inference algorithm. We demonstrate the advantage of our approach in several real-world applications.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22ah.html PDF: https://proceedings.mlr.press/v162/wang22ah/wang22ah.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22ah.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zheng family: Wang - given: Yiming family: Xu - given: Conor family: Tillinghast - given: Shibo family: Li - given: Akil family: Narayan - given: Shandian family: Zhe editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23237-23253 id: wang22ah issued: date-parts: - 2022 - 6 - 28 firstpage: 23237 lastpage: 23253 published: 2022-06-28 00:00:00 +0000 - title: 'When Are Linear Stochastic Bandits Attackable?' abstract: 'We study adversarial attacks on linear stochastic bandits: by manipulating the rewards, an adversary aims to control the behaviour of the bandit algorithm. Perhaps surprisingly, we first show that some attack goals can never be achieved. This is in a sharp contrast to context-free stochastic bandits, and is intrinsically due to the correlation among arms in linear stochastic bandits. Motivated by this finding, this paper studies the attackability of a $k$-armed linear bandit environment. We first provide a complete necessity and sufficiency characterization of attackability based on the geometry of the arms’ context vectors. We then propose a two-stage attack method against LinUCB and Robust Phase Elimination. The method first asserts whether the given environment is attackable; and if yes, it poisons the rewards to force the algorithm to pull a target arm linear times using only a sublinear cost. Numerical experiments further validate the effectiveness and cost-efficiency of the proposed attack method.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22ai.html PDF: https://proceedings.mlr.press/v162/wang22ai/wang22ai.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22ai.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Huazheng family: Wang - given: Haifeng family: Xu - given: Hongning family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23254-23273 id: wang22ai issued: date-parts: - 2022 - 6 - 28 firstpage: 23254 lastpage: 23273 published: 2022-06-28 00:00:00 +0000 - title: 'DRAGONN: Distributed Randomized Approximate Gradients of Neural Networks' abstract: 'Data-parallel distributed training (DDT) has become the de-facto standard for accelerating the training of most deep learning tasks on massively parallel hardware. In the DDT paradigm, the communication overhead of gradient synchronization is the major efficiency bottleneck. A widely adopted approach to tackle this issue is gradient sparsification (GS). However, the current GS methods introduce significant new overhead in compressing the gradients, outweighing the communication overhead and becoming the new efficiency bottleneck. In this paper, we propose DRAGONN, a randomized hashing algorithm for GS in DDT. DRAGONN can significantly reduce the compression time by up to 70% compared to state-of-the-art GS approaches, and achieve up to 3.52x speedup in total training throughput.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22aj.html PDF: https://proceedings.mlr.press/v162/wang22aj/wang22aj.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22aj.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhuang family: Wang - given: Zhaozhuo family: Xu - given: Xinyu family: Wu - given: Anshumali family: Shrivastava - given: T. S. Eugene family: Ng editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23274-23291 id: wang22aj issued: date-parts: - 2022 - 6 - 28 firstpage: 23274 lastpage: 23291 published: 2022-06-28 00:00:00 +0000 - title: 'Finite-Sum Coupled Compositional Stochastic Optimization: Theory and Applications' abstract: 'This paper studies stochastic optimization for a sum of compositional functions, where the inner-level function of each summand is coupled with the corresponding summation index. We refer to this family of problems as finite-sum coupled compositional optimization (FCCO). It has broad applications in machine learning for optimizing non-convex or convex compositional measures/objectives such as average precision (AP), p-norm push, listwise ranking losses, neighborhood component analysis (NCA), deep survival analysis, deep latent variable models, etc., which deserves finer analysis. Yet, existing algorithms and analyses are restricted in one or other aspects. The contribution of this paper is to provide a comprehensive convergence analysis of a simple stochastic algorithm for both non-convex and convex objectives. Our key result is the improved oracle complexity with the parallel speed-up by using the moving-average based estimator with mini-batching. Our theoretical analysis also exhibits new insights for improving the practical implementation by sampling the batches of equal size for the outer and inner levels. Numerical experiments on AP maximization, NCA, and p-norm push corroborate some aspects of the theory.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22ak.html PDF: https://proceedings.mlr.press/v162/wang22ak/wang22ak.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22ak.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Bokun family: Wang - given: Tianbao family: Yang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23292-23317 id: wang22ak issued: date-parts: - 2022 - 6 - 28 firstpage: 23292 lastpage: 23317 published: 2022-06-28 00:00:00 +0000 - title: 'OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework' abstract: 'In this work, we pursue a unified paradigm for multimodal pretraining to break the shackles of complex task/modality-specific customization. We propose OFA, a Task-Agnostic and Modality-Agnostic framework that supports Task Comprehensiveness. OFA unifies a diverse set of cross-modal and unimodal tasks, including image generation, visual grounding, image captioning, image classification, language modeling, etc., in a simple sequence-to-sequence learning framework. OFA follows the instruction-based learning in both pretraining and finetuning stages, requiring no extra task-specific layers for downstream tasks. In comparison with the recent state-of-the-art vision & language models that rely on extremely large cross-modal datasets, OFA is pretrained on only 20M publicly available image-text pairs. Despite its simplicity and relatively small-scale training data, OFA achieves new SOTAs in a series of cross-modal tasks while attaining highly competitive performances on uni-modal tasks. Our further analysis indicates that OFA can also effectively transfer to unseen tasks and unseen domains. Our code and models are publicly available at https://github.com/OFA-Sys/OFA.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22al.html PDF: https://proceedings.mlr.press/v162/wang22al/wang22al.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22al.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Peng family: Wang - given: An family: Yang - given: Rui family: Men - given: Junyang family: Lin - given: Shuai family: Bai - given: Zhikang family: Li - given: Jianxin family: Ma - given: Chang family: Zhou - given: Jingren family: Zhou - given: Hongxia family: Yang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23318-23340 id: wang22al issued: date-parts: - 2022 - 6 - 28 firstpage: 23318 lastpage: 23340 published: 2022-06-28 00:00:00 +0000 - title: 'How Powerful are Spectral Graph Neural Networks' abstract: 'Spectral Graph Neural Network is a kind of Graph Neural Network (GNN) based on graph signal filters. Some models able to learn arbitrary spectral filters have emerged recently. However, few works analyze the expressive power of spectral GNNs. This paper studies spectral GNNs’ expressive power theoretically. We first prove that even spectral GNNs without nonlinearity can produce arbitrary graph signals and give two conditions for reaching universality. They are: 1) no multiple eigenvalues of graph Laplacian, and 2) no missing frequency components in node features. We also establish a connection between the expressive power of spectral GNNs and Graph Isomorphism (GI) testing, the latter of which is often used to characterize spatial GNNs’ expressive power. Moreover, we study the difference in empirical performance among different spectral GNNs with the same expressive power from an optimization perspective, and motivate the use of an orthogonal basis whose weight function corresponds to the graph signal density in the spectrum. Inspired by the analysis, we propose JacobiConv, which uses Jacobi basis due to its orthogonality and flexibility to adapt to a wide range of weight functions. JacobiConv deserts nonlinearity while outperforming all baselines on both synthetic and real-world datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22am.html PDF: https://proceedings.mlr.press/v162/wang22am/wang22am.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22am.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xiyuan family: Wang - given: Muhan family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23341-23362 id: wang22am issued: date-parts: - 2022 - 6 - 28 firstpage: 23341 lastpage: 23362 published: 2022-06-28 00:00:00 +0000 - title: 'Thompson Sampling for Robust Transfer in Multi-Task Bandits' abstract: 'We study the problem of online multi-task learning where the tasks are performed within similar but not necessarily identical multi-armed bandit environments. In particular, we study how a learner can improve its overall performance across multiple related tasks through robust transfer of knowledge. While an upper confidence bound (UCB)-based algorithm has recently been shown to achieve nearly-optimal performance guarantees in a setting where all tasks are solved concurrently, it remains unclear whether Thompson sampling (TS) algorithms, which have superior empirical performance in general, share similar theoretical properties. In this work, we present a TS-type algorithm for a more general online multi-task learning protocol, which extends the concurrent setting. We provide its frequentist analysis and prove that it is also nearly-optimal using a novel concentration inequality for multi-task data aggregation at random stopping times. Finally, we evaluate the algorithm on synthetic data and show that the TS-type algorithm enjoys superior empirical performance in comparison with the UCB-based algorithm and a baseline algorithm that performs TS for each individual task without transfer.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22an.html PDF: https://proceedings.mlr.press/v162/wang22an/wang22an.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22an.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhi family: Wang - given: Chicheng family: Zhang - given: Kamalika family: Chaudhuri editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23363-23416 id: wang22an issued: date-parts: - 2022 - 6 - 28 firstpage: 23363 lastpage: 23416 published: 2022-06-28 00:00:00 +0000 - title: 'Individual Reward Assisted Multi-Agent Reinforcement Learning' abstract: 'In many real-world multi-agent systems, the sparsity of team rewards often makes it difficult for an algorithm to successfully learn a cooperative team policy. At present, the common way for solving this problem is to design some dense individual rewards for the agents to guide the cooperation. However, most existing works utilize individual rewards in ways that do not always promote teamwork and sometimes are even counterproductive. In this paper, we propose Individual Reward Assisted Team Policy Learning (IRAT), which learns two policies for each agent from the dense individual reward and the sparse team reward with discrepancy constraints for updating the two policies mutually. Experimental results in different scenarios, such as the Multi-Agent Particle Environment and the Google Research Football Environment, show that IRAT significantly outperforms the baseline methods and can greatly promote team policy learning without deviating from the original team objective, even when the individual rewards are misleading or conflict with the team rewards.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22ao.html PDF: https://proceedings.mlr.press/v162/wang22ao/wang22ao.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22ao.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Li family: Wang - given: Yupeng family: Zhang - given: Yujing family: Hu - given: Weixun family: Wang - given: Chongjie family: Zhang - given: Yang family: Gao - given: Jianye family: Hao - given: Tangjie family: Lv - given: Changjie family: Fan editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23417-23432 id: wang22ao issued: date-parts: - 2022 - 6 - 28 firstpage: 23417 lastpage: 23432 published: 2022-06-28 00:00:00 +0000 - title: 'Removing Batch Normalization Boosts Adversarial Training' abstract: 'Adversarial training (AT) defends deep neural networks against adversarial attacks. One challenge that limits its practical application is the performance degradation on clean samples. A major bottleneck identified by previous works is the widely used batch normalization (BN), which struggles to model the different statistics of clean and adversarial training samples in AT. Although the dominant approach is to extend BN to capture this mixture of distribution, we propose to completely eliminate this bottleneck by removing all BN layers in AT. Our normalizer-free robust training (NoFrost) method extends recent advances in normalizer-free networks to AT for its unexplored advantage on handling the mixture distribution challenge. We show that NoFrost achieves adversarial robustness with only a minor sacrifice on clean sample accuracy. On ImageNet with ResNet50, NoFrost achieves $74.06%$ clean accuracy, which drops merely $2.00%$ from standard training. In contrast, BN-based AT obtains $59.28%$ clean accuracy, suffering a significant $16.78%$ drop from standard training. In addition, NoFrost achieves a $23.56%$ adversarial robustness against PGD attack, which improves the $13.57%$ robustness in BN-based AT. We observe better model smoothness and larger decision margins from NoFrost, which make the models less sensitive to input perturbations and thus more robust. Moreover, when incorporating more data augmentations into NoFrost, it achieves comprehensive robustness against multiple distribution shifts. Code and pre-trained models are public at https://github.com/amazon-research/normalizer-free-robust-training.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22ap.html PDF: https://proceedings.mlr.press/v162/wang22ap/wang22ap.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22ap.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Haotao family: Wang - given: Aston family: Zhang - given: Shuai family: Zheng - given: Xingjian family: Shi - given: Mu family: Li - given: Zhangyang family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23433-23445 id: wang22ap issued: date-parts: - 2022 - 6 - 28 firstpage: 23433 lastpage: 23445 published: 2022-06-28 00:00:00 +0000 - title: 'Partial and Asymmetric Contrastive Learning for Out-of-Distribution Detection in Long-Tailed Recognition' abstract: 'Existing out-of-distribution (OOD) detection methods are typically benchmarked on training sets with balanced class distributions. However, in real-world applications, it is common for the training sets to have long-tailed distributions. In this work, we first demonstrate that existing OOD detection methods commonly suffer from significant performance degradation when the training set is long-tail distributed. Through analysis, we posit that this is because the models struggle to distinguish the minority tail-class in-distribution samples, from the true OOD samples, making the tail classes more prone to be falsely detected as OOD. To solve this problem, we propose Partial and Asymmetric Supervised Contrastive Learning (PASCL), which explicitly encourages the model to distinguish between tail-class in-distribution samples and OOD samples. To further boost in-distribution classification accuracy, we propose Auxiliary Branch Finetuning, which uses two separate branches of BN and classification layers for anomaly detection and in-distribution classification, respectively. The intuition is that in-distribution and OOD anomaly data have different underlying distributions. Our method outperforms previous state-of-the-art method by $1.29%$, $1.45%$, $0.69%$ anomaly detection false positive rate (FPR) and $3.24%$, $4.06%$, $7.89%$ in-distribution classification accuracy on CIFAR10-LT, CIFAR100-LT, and ImageNet-LT, respectively. Code and pre-trained models are available at https://github.com/amazon-research/long-tailed-ood-detection.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22aq.html PDF: https://proceedings.mlr.press/v162/wang22aq/wang22aq.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22aq.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Haotao family: Wang - given: Aston family: Zhang - given: Yi family: Zhu - given: Shuai family: Zheng - given: Mu family: Li - given: Alex J family: Smola - given: Zhangyang family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23446-23458 id: wang22aq issued: date-parts: - 2022 - 6 - 28 firstpage: 23446 lastpage: 23458 published: 2022-06-28 00:00:00 +0000 - title: 'Nonparametric Factor Trajectory Learning for Dynamic Tensor Decomposition' abstract: 'Tensor decomposition is a fundamental framework to analyze data that can be represented by multi-dimensional arrays. In practice, tensor data are often accompanied with temporal information, namely the time points when the entry values were generated. This information implies abundant, complex temporal variation patterns. However, current methods always assume the factor representations of the entities in each tensor mode are static, and never consider their temporal evolution. To fill this gap, we propose NONparametric FActor Trajectory learning for dynamic tensor decomposition (NONFAT). We place Gaussian process (GP) priors in the frequency domain and conduct inverse Fourier transform via Gauss-Laguerre quadrature to sample the trajectory functions. In this way, we can overcome data sparsity and obtain robust trajectory estimates across long time horizons. Given the trajectory values at specific time points, we use a second-level GP to sample the entry values and to capture the temporal relationship between the entities. For efficient and scalable inference, we leverage the matrix Gaussian structure in the model, introduce a matrix Gaussian posterior, and develop a nested sparse variational learning algorithm. We have shown the advantage of our method in several real-world applications.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22ar.html PDF: https://proceedings.mlr.press/v162/wang22ar/wang22ar.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22ar.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zheng family: Wang - given: Shandian family: Zhe editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23459-23469 id: wang22ar issued: date-parts: - 2022 - 6 - 28 firstpage: 23459 lastpage: 23469 published: 2022-06-28 00:00:00 +0000 - title: 'Thompson Sampling for (Combinatorial) Pure Exploration' abstract: 'Existing methods of combinatorial pure exploration mainly focus on the UCB approach. To make the algorithm efficient, they usually use the sum of upper confidence bounds within arm set $S$ to represent the upper confidence bound of $S$, which can be much larger than the tight upper confidence bound of $S$ and leads to a much higher complexity than necessary, since the empirical means of different arms in $S$ are independent. To deal with this challenge, we explore the idea of Thompson Sampling (TS) that uses independent random samples instead of the upper confidence bounds, and design the first TS-based algorithm TS-Explore for (combinatorial) pure exploration. In TS-Explore, the sum of independent random samples within arm set $S$ will not exceed the tight upper confidence bound of $S$ with high probability. Hence it solves the above challenge, and achieves a lower complexity upper bound than existing efficient UCB-based algorithms in general combinatorial pure exploration. As for pure exploration of classic multi-armed bandit, we show that TS-Explore achieves an asymptotically optimal complexity upper bound.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22as.html PDF: https://proceedings.mlr.press/v162/wang22as/wang22as.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22as.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Siwei family: Wang - given: Jun family: Zhu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23470-23483 id: wang22as issued: date-parts: - 2022 - 6 - 28 firstpage: 23470 lastpage: 23483 published: 2022-06-28 00:00:00 +0000 - title: 'Policy Gradient Method For Robust Reinforcement Learning' abstract: 'This paper develops the first policy gradient method with global optimality guarantee and complexity analysis for robust reinforcement learning under model mismatch. Robust reinforcement learning is to learn a policy robust to model mismatch between simulator and real environment. We first develop the robust policy (sub-)gradient, which is applicable for any differentiable parametric policy class. We show that the proposed robust policy gradient method converges to the global optimum asymptotically under direct policy parameterization. We further develop a smoothed robust policy gradient method, and show that to achieve an $\epsilon$-global optimum, the complexity is $\mathcal O(\epsilon^{-3})$. We then extend our methodology to the general model-free setting, and design the robust actor-critic method with differentiable parametric policy class and value function. We further characterize its asymptotic convergence and sample complexity under the tabular setting. Finally, we provide simulation results to demonstrate the robustness of our methods.' volume: 162 URL: https://proceedings.mlr.press/v162/wang22at.html PDF: https://proceedings.mlr.press/v162/wang22at/wang22at.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wang22at.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yue family: Wang - given: Shaofeng family: Zou editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23484-23526 id: wang22at issued: date-parts: - 2022 - 6 - 28 firstpage: 23484 lastpage: 23526 published: 2022-06-28 00:00:00 +0000 - title: 'Certifying Out-of-Domain Generalization for Blackbox Functions' abstract: 'Certifying the robustness of model performance under bounded data distribution drifts has recently attracted intensive interest under the umbrella of distributional robustness. However, existing techniques either make strong assumptions on the model class and loss functions that can be certified, such as smoothness expressed via Lipschitz continuity of gradients, or require to solve complex optimization problems. As a result, the wider application of these techniques is currently limited by its scalability and flexibility — these techniques often do not scale to large-scale datasets with modern deep neural networks or cannot handle loss functions which may be non-smooth such as the 0-1 loss. In this paper, we focus on the problem of certifying distributional robustness for blackbox models and bounded loss functions, and propose a novel certification framework based on the Hellinger distance. Our certification technique scales to ImageNet-scale datasets, complex models, and a diverse set of loss functions. We then focus on one specific application enabled by such scalability and flexibility, i.e., certifying out-of-domain generalization for large neural networks and loss functions such as accuracy and AUC. We experimentally validate our certification method on a number of datasets, ranging from ImageNet, where we provide the first non-vacuous certified out-of-domain generalization, to smaller classification tasks where we are able to compare with the state-of-the-art and show that our method performs considerably better.' volume: 162 URL: https://proceedings.mlr.press/v162/weber22a.html PDF: https://proceedings.mlr.press/v162/weber22a/weber22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-weber22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Maurice G family: Weber - given: Linyi family: Li - given: Boxin family: Wang - given: Zhikuan family: Zhao - given: Bo family: Li - given: Ce family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23527-23548 id: weber22a issued: date-parts: - 2022 - 6 - 28 firstpage: 23527 lastpage: 23548 published: 2022-06-28 00:00:00 +0000 - title: 'More Than a Toy: Random Matrix Models Predict How Real-World Neural Representations Generalize' abstract: 'Of theories for why large-scale machine learning models generalize despite being vastly overparameterized, which of their assumptions are needed to capture the qualitative phenomena of generalization in the real world? On one hand, we find that most theoretical analyses fall short of capturing these qualitative phenomena even for kernel regression, when applied to kernels derived from large-scale neural networks (e.g., ResNet-50) and real data (e.g., CIFAR-100). On the other hand, we find that the classical GCV estimator (Craven and Wahba, 1978) accurately predicts generalization risk even in such overparameterized settings. To bolster this empirical finding, we prove that the GCV estimator converges to the generalization risk whenever a local random matrix law holds. Finally, we apply this random matrix theory lens to explain why pretrained representations generalize better as well as what factors govern scaling laws for kernel regression. Our findings suggest that random matrix theory, rather than just being a toy model, may be central to understanding the properties of neural representations in practice.' volume: 162 URL: https://proceedings.mlr.press/v162/wei22a.html PDF: https://proceedings.mlr.press/v162/wei22a/wei22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wei22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Alexander family: Wei - given: Wei family: Hu - given: Jacob family: Steinhardt editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23549-23588 id: wei22a issued: date-parts: - 2022 - 6 - 28 firstpage: 23549 lastpage: 23588 published: 2022-06-28 00:00:00 +0000 - title: 'To Smooth or Not? When Label Smoothing Meets Noisy Labels' abstract: 'Label smoothing (LS) is an arising learning paradigm that uses the positively weighted average of both the hard training labels and uniformly distributed soft labels. It was shown that LS serves as a regularizer for training data with hard labels and therefore improves the generalization of the model. Later it was reported LS even helps with improving robustness when learning with noisy labels. However, we observed that the advantage of LS vanishes when we operate in a high label noise regime. Intuitively speaking, this is due to the increased entropy of P(noisy label|X) when the noise rate is high, in which case, further applying LS tends to “over-smooth” the estimated posterior. We proceeded to discover that several learning-with-noisy-labels solutions in the literature instead relate more closely to negative/not label smoothing (NLS), which acts counter to LS and defines as using a negative weight to combine the hard and soft labels! We provide understandings for the properties of LS and NLS when learning with noisy labels. Among other established properties, we theoretically show NLS is considered more beneficial when the label noise rates are high. We provide extensive experimental results on multiple benchmarks to support our findings too. Code is publicly available at https://github.com/UCSC-REAL/negative-label-smoothing.' volume: 162 URL: https://proceedings.mlr.press/v162/wei22b.html PDF: https://proceedings.mlr.press/v162/wei22b/wei22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wei22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jiaheng family: Wei - given: Hangyu family: Liu - given: Tongliang family: Liu - given: Gang family: Niu - given: Masashi family: Sugiyama - given: Yang family: Liu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23589-23614 id: wei22b issued: date-parts: - 2022 - 6 - 28 firstpage: 23589 lastpage: 23614 published: 2022-06-28 00:00:00 +0000 - title: 'Open-Sampling: Exploring Out-of-Distribution data for Re-balancing Long-tailed datasets' abstract: 'Deep neural networks usually perform poorly when the training dataset suffers from extreme class imbalance. Recent studies found that directly training with out-of-distribution data (i.e., open-set samples) in a semi-supervised manner would harm the generalization performance. In this work, we theoretically show that out-of-distribution data can still be leveraged to augment the minority classes from a Bayesian perspective. Based on this motivation, we propose a novel method called Open-sampling, which utilizes open-set noisy labels to re-balance the class priors of the training dataset. For each open-set instance, the label is sampled from our pre-defined distribution that is complementary to the distribution of original class priors. We empirically show that Open-sampling not only re-balances the class priors but also encourages the neural network to learn separable representations. Extensive experiments demonstrate that our proposed method significantly outperforms existing data re-balancing methods and can boost the performance of existing state-of-the-art methods.' volume: 162 URL: https://proceedings.mlr.press/v162/wei22c.html PDF: https://proceedings.mlr.press/v162/wei22c/wei22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wei22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hongxin family: Wei - given: Lue family: Tao - given: Renchunzi family: Xie - given: Lei family: Feng - given: Bo family: An editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23615-23630 id: wei22c issued: date-parts: - 2022 - 6 - 28 firstpage: 23615 lastpage: 23630 published: 2022-06-28 00:00:00 +0000 - title: 'Mitigating Neural Network Overconfidence with Logit Normalization' abstract: 'Detecting out-of-distribution inputs is critical for the safe deployment of machine learning models in the real world. However, neural networks are known to suffer from the overconfidence issue, where they produce abnormally high confidence for both in- and out-of-distribution inputs. In this work, we show that this issue can be mitigated through Logit Normalization (LogitNorm)—a simple fix to the cross-entropy loss—by enforcing a constant vector norm on the logits in training. Our method is motivated by the analysis that the norm of the logit keeps increasing during training, leading to overconfident output. Our key idea behind LogitNorm is thus to decouple the influence of output’s norm during network optimization. Trained with LogitNorm, neural networks produce highly distinguishable confidence scores between in- and out-of-distribution data. Extensive experiments demonstrate the superiority of LogitNorm, reducing the average FPR95 by up to 42.30% on common benchmarks.' volume: 162 URL: https://proceedings.mlr.press/v162/wei22d.html PDF: https://proceedings.mlr.press/v162/wei22d/wei22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wei22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hongxin family: Wei - given: Renchunzi family: Xie - given: Hao family: Cheng - given: Lei family: Feng - given: Bo family: An - given: Yixuan family: Li editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23631-23644 id: wei22d issued: date-parts: - 2022 - 6 - 28 firstpage: 23631 lastpage: 23644 published: 2022-06-28 00:00:00 +0000 - title: 'Koopman Q-learning: Offline Reinforcement Learning via Symmetries of Dynamics' abstract: 'Offline reinforcement learning leverages large datasets to train policies without interactions with the environment. The learned policies may then be deployed in real-world settings where interactions are costly or dangerous. Current algorithms over-fit to the training dataset and as a consequence perform poorly when deployed to out-of-distribution generalizations of the environment. We aim to address these limitations by learning a Koopman latent representation which allows us to infer symmetries of the system’s underlying dynamic. The latter is then utilized to extend the otherwise static offline dataset during training; this constitutes a novel data augmentation framework which reflects the system’s dynamic and is thus to be interpreted as an exploration of the environments phase space. To obtain the symmetries we employ Koopman theory in which nonlinear dynamics are represented in terms of a linear operator acting on the space of measurement functions of the system. We provide novel theoretical results on the existence and nature of symmetries relevant for control systems such as reinforcement learning settings. Moreover, we empirically evaluate our method on several benchmark offline reinforcement learning tasks and datasets including D4RL, Metaworld and Robosuite and find that by using our framework we consistently improve the state-of-the-art of model-free Q-learning methods.' volume: 162 URL: https://proceedings.mlr.press/v162/weissenbacher22a.html PDF: https://proceedings.mlr.press/v162/weissenbacher22a/weissenbacher22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-weissenbacher22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Matthias family: Weissenbacher - given: Samarth family: Sinha - given: Animesh family: Garg - given: Kawahara family: Yoshinobu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23645-23667 id: weissenbacher22a issued: date-parts: - 2022 - 6 - 28 firstpage: 23645 lastpage: 23667 published: 2022-06-28 00:00:00 +0000 - title: 'Fishing for User Data in Large-Batch Federated Learning via Gradient Magnification' abstract: 'Federated learning (FL) has rapidly risen in popularity due to its promise of privacy and efficiency. Previous works have exposed privacy vulnerabilities in the FL pipeline by recovering user data from gradient updates. However, existing attacks fail to address realistic settings because they either 1) require toy settings with very small batch sizes, or 2) require unrealistic and conspicuous architecture modifications. We introduce a new strategy that dramatically elevates existing attacks to operate on batches of arbitrarily large size, and without architectural modifications. Our model-agnostic strategy only requires modifications to the model parameters sent to the user, which is a realistic threat model in many scenarios. We demonstrate the strategy in challenging large-scale settings, obtaining high-fidelity data extraction in both cross-device and cross-silo federated learning. Code is available at https://github.com/JonasGeiping/breaching.' volume: 162 URL: https://proceedings.mlr.press/v162/wen22a.html PDF: https://proceedings.mlr.press/v162/wen22a/wen22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wen22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yuxin family: Wen - given: Jonas A. family: Geiping - given: Liam family: Fowl - given: Micah family: Goldblum - given: Tom family: Goldstein editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23668-23684 id: wen22a issued: date-parts: - 2022 - 6 - 28 firstpage: 23668 lastpage: 23684 published: 2022-06-28 00:00:00 +0000 - title: 'BabelTower: Learning to Auto-parallelized Program Translation' abstract: 'GPUs have become the dominant computing platforms for many applications, while programming GPUs with the widely-used CUDA parallel programming model is difficult. As sequential C code is relatively easy to obtain either from legacy repositories or by manual implementation, automatically translating C to its parallel CUDA counterpart is promising to relieve the burden of GPU programming. However, because of huge differences between the sequential C and the parallel CUDA programming model, existing approaches fail to conduct the challenging auto-parallelized program translation. In this paper, we propose a learning-based framework, i.e., BabelTower, to address this problem. We first create a large-scale dataset consisting of compute-intensive function-level monolingual corpora. We further propose using back-translation with a discriminative reranker to cope with unpaired corpora and parallel semantic conversion. Experimental results show that BabelTower outperforms state-of-the-art by 1.79, 6.09, and 9.39 in terms of BLEU, CodeBLEU, and specifically designed ParaBLEU, respectively. The CUDA code generated by BabelTower attains a speedup of up to 347x over the sequential C code, and the developer productivity is improved by at most 3.8x.' volume: 162 URL: https://proceedings.mlr.press/v162/wen22b.html PDF: https://proceedings.mlr.press/v162/wen22b/wen22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wen22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yuanbo family: Wen - given: Qi family: Guo - given: Qiang family: Fu - given: Xiaqing family: Li - given: Jianxing family: Xu - given: Yanlin family: Tang - given: Yongwei family: Zhao - given: Xing family: Hu - given: Zidong family: Du - given: Ling family: Li - given: Chao family: Wang - given: Xuehai family: Zhou - given: Yunji family: Chen editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23685-23700 id: wen22b issued: date-parts: - 2022 - 6 - 28 firstpage: 23685 lastpage: 23700 published: 2022-06-28 00:00:00 +0000 - title: 'Random Forest Density Estimation' abstract: 'We propose a density estimation algorithm called random forest density estimation (RFDE) based on random trees where the split of cell is along the midpoint of the randomly chosen dimension. By combining the efficient random tree density estimation (RTDE) and the ensemble procedure, RFDE can alleviate the problems of boundary discontinuity suffered by partition-based density estimations. From the theoretical perspective, we first prove the fast convergence rates of RFDE if the density function lies in the Hölder space $C^{0,\alpha}$. Moreover, if the target function resides in the subspace $C^{1,\alpha}$, which contains smoother density functions, we for the first time manage to explain the benefits of the ensemble learning in density estimation. To be specific, we show that the upper bound of the ensemble estimator RFDE turns out to be strictly smaller than the lower bound of its base estimator RTDE in terms of convergence rates. In the experiments, we verify the theoretical results and show the promising performance of RFDE on both synthetic and real world datasets. Moreover, we evaluate our RFDE through the problem of anomaly detection as a possible application.' volume: 162 URL: https://proceedings.mlr.press/v162/wen22c.html PDF: https://proceedings.mlr.press/v162/wen22c/wen22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wen22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hongwei family: Wen - given: Hanyuan family: Hang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23701-23722 id: wen22c issued: date-parts: - 2022 - 6 - 28 firstpage: 23701 lastpage: 23722 published: 2022-06-28 00:00:00 +0000 - title: 'Fighting Fire with Fire: Avoiding DNN Shortcuts through Priming' abstract: 'Across applications spanning supervised classification and sequential control, deep learning has been reported to find “shortcut” solutions that fail catastrophically under minor changes in the data distribution. In this paper, we show empirically that DNNs can be coaxed to avoid poor shortcuts by providing an additional “priming” feature computed from key input features, usually a coarse output estimate. Priming relies on approximate domain knowledge of these task-relevant key input features, which is often easy to obtain in practical settings. For example, one might prioritize recent frames over past frames in a video input for visual imitation learning, or salient foreground over background pixels for image classification. On NICO image classification, MuJoCo continuous control, and CARLA autonomous driving, our priming strategy works significantly better than several popular state-of-the-art approaches for feature selection and data augmentation. We connect these empirical findings to recent theoretical results on DNN optimization, and argue theoretically that priming distracts the optimizer away from poor shortcuts by creating better, simpler shortcuts.' volume: 162 URL: https://proceedings.mlr.press/v162/wen22d.html PDF: https://proceedings.mlr.press/v162/wen22d/wen22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wen22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Chuan family: Wen - given: Jianing family: Qian - given: Jierui family: Lin - given: Jiaye family: Teng - given: Dinesh family: Jayaraman - given: Yang family: Gao editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23723-23750 id: wen22d issued: date-parts: - 2022 - 6 - 28 firstpage: 23723 lastpage: 23750 published: 2022-06-28 00:00:00 +0000 - title: 'Preconditioning for Scalable Gaussian Process Hyperparameter Optimization' abstract: 'Gaussian process hyperparameter optimization requires linear solves with, and log-determinants of, large kernel matrices. Iterative numerical techniques are becoming popular to scale to larger datasets, relying on the conjugate gradient method (CG) for the linear solves and stochastic trace estimation for the log-determinant. This work introduces new algorithmic and theoretical insights for preconditioning these computations. While preconditioning is well understood in the context of CG, we demonstrate that it can also accelerate convergence and reduce variance of the estimates for the log-determinant and its derivative. We prove general probabilistic error bounds for the preconditioned computation of the log-determinant, log-marginal likelihood and its derivatives. Additionally, we derive specific rates for a range of kernel-preconditioner combinations, showing that up to exponential convergence can be achieved. Our theoretical results enable provably efficient optimization of kernel hyperparameters, which we validate empirically on large-scale benchmark problems. There our approach accelerates training by up to an order of magnitude.' volume: 162 URL: https://proceedings.mlr.press/v162/wenger22a.html PDF: https://proceedings.mlr.press/v162/wenger22a/wenger22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wenger22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jonathan family: Wenger - given: Geoff family: Pleiss - given: Philipp family: Hennig - given: John family: Cunningham - given: Jacob family: Gardner editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23751-23780 id: wenger22a issued: date-parts: - 2022 - 6 - 28 firstpage: 23751 lastpage: 23780 published: 2022-06-28 00:00:00 +0000 - title: 'Measure Estimation in the Barycentric Coding Model' abstract: 'This paper considers the problem of measure estimation under the barycentric coding model (BCM), in which an unknown measure is assumed to belong to the set of Wasserstein-2 barycenters of a finite set of known measures. Estimating a measure under this model is equivalent to estimating the unknown barycentric coordinates. We provide novel geometrical, statistical, and computational insights for measure estimation under the BCM, consisting of three main results. Our first main result leverages the Riemannian geometry of Wasserstein-2 space to provide a procedure for recovering the barycentric coordinates as the solution to a quadratic optimization problem assuming access to the true reference measures. The essential geometric insight is that the parameters of this quadratic problem are determined by inner products between the optimal displacement maps from the given measure to the reference measures defining the BCM. Our second main result then establishes an algorithm for solving for the coordinates in the BCM when all the measures are observed empirically via i.i.d. samples. We prove precise rates of convergence for this algorithm—determined by the smoothness of the underlying measures and their dimensionality—thereby guaranteeing its statistical consistency. Finally, we demonstrate the utility of the BCM and associated estimation procedures in three application areas: (i) covariance estimation for Gaussian measures; (ii) image processing; and (iii) natural language processing.' volume: 162 URL: https://proceedings.mlr.press/v162/werenski22a.html PDF: https://proceedings.mlr.press/v162/werenski22a/werenski22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-werenski22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Matthew family: Werenski - given: Ruijie family: Jiang - given: Abiy family: Tasissa - given: Shuchin family: Aeron - given: James M family: Murphy editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23781-23803 id: werenski22a issued: date-parts: - 2022 - 6 - 28 firstpage: 23781 lastpage: 23803 published: 2022-06-28 00:00:00 +0000 - title: 'COLA: Consistent Learning with Opponent-Learning Awareness' abstract: 'Learning in general-sum games is unstable and frequently leads to socially undesirable (Pareto-dominated) outcomes. To mitigate this, Learning with Opponent-Learning Awareness (LOLA) introduced opponent shaping to this setting, by accounting for each agent’s influence on their opponents’ anticipated learning steps. However, the original LOLA formulation (and follow-up work) is inconsistent because LOLA models other agents as naive learners rather than LOLA agents. In previous work, this inconsistency was suggested as a cause of LOLA’s failure to preserve stable fixed points (SFPs). First, we formalize consistency and show that higher-order LOLA (HOLA) solves LOLA’s inconsistency problem if it converges. Second, we correct a claim made in the literature by Sch{ä}fer and Anandkumar (2019), proving that Competitive Gradient Descent (CGD) does not recover HOLA as a series expansion (and fails to solve the consistency problem). Third, we propose a new method called Consistent LOLA (COLA), which learns update functions that are consistent under mutual opponent shaping. It requires no more than second-order derivatives and learns consistent update functions even when HOLA fails to converge. However, we also prove that even consistent update functions do not preserve SFPs, contradicting the hypothesis that this shortcoming is caused by LOLA’s inconsistency. Finally, in an empirical evaluation on a set of general-sum games, we find that COLA finds prosocial solutions and that it converges under a wider range of learning rates than HOLA and LOLA. We support the latter finding with a theoretical result for a simple game.' volume: 162 URL: https://proceedings.mlr.press/v162/willi22a.html PDF: https://proceedings.mlr.press/v162/willi22a/willi22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-willi22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Timon family: Willi - given: Alistair Hp family: Letcher - given: Johannes family: Treutlein - given: Jakob family: Foerster editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23804-23831 id: willi22a issued: date-parts: - 2022 - 6 - 28 firstpage: 23804 lastpage: 23831 published: 2022-06-28 00:00:00 +0000 - title: 'Distributional Hamilton-Jacobi-Bellman Equations for Continuous-Time Reinforcement Learning' abstract: 'Continuous-time reinforcement learning offers an appealing formalism for describing control problems in which the passage of time is not naturally divided into discrete increments. Here we consider the problem of predicting the distribution of returns obtained by an agent interacting in a continuous-time, stochastic environment. Accurate return predictions have proven useful for determining optimal policies for risk-sensitive control, learning state representations, multiagent coordination, and more. We begin by establishing the distributional analogue of the Hamilton-Jacobi-Bellman (HJB) equation for Ito diffusions and the broader class of Feller-Dynkin processes. We then specialize this equation to the setting in which the return distribution is approximated by N uniformly-weighted particles, a common design choice in distributional algorithms. Our derivation highlights additional terms due to statistical diffusivity which arise from the proper handling of distributions in the continuous-time setting. Based on this, we propose a tractable algorithm for approximately solving the distributional HJB based on a JKO scheme, which can be implemented in an online, control algorithm. We demonstrate the effectiveness of such an algorithm in a synthetic control problem.' volume: 162 URL: https://proceedings.mlr.press/v162/wiltzer22a.html PDF: https://proceedings.mlr.press/v162/wiltzer22a/wiltzer22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wiltzer22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Harley E family: Wiltzer - given: David family: Meger - given: Marc G. family: Bellemare editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23832-23856 id: wiltzer22a issued: date-parts: - 2022 - 6 - 28 firstpage: 23832 lastpage: 23856 published: 2022-06-28 00:00:00 +0000 - title: 'Easy Variational Inference for Categorical Models via an Independent Binary Approximation' abstract: 'We pursue tractable Bayesian analysis of generalized linear models (GLMs) for categorical data. GLMs have been difficult to scale to more than a few dozen categories due to non-conjugacy or strong posterior dependencies when using conjugate auxiliary variable methods. We define a new class of GLMs for categorical data called categorical-from-binary (CB) models. Each CB model has a likelihood that is bounded by the product of binary likelihoods, suggesting a natural posterior approximation. This approximation makes inference straightforward and fast; using well-known auxiliary variables for probit or logistic regression, the product of binary models admits conjugate closed-form variational inference that is embarrassingly parallel across categories and invariant to category ordering. Moreover, an independent binary model simultaneously approximates multiple CB models. Bayesian model averaging over these can improve the quality of the approximation for any given dataset. We show that our approach scales to thousands of categories, outperforming posterior estimation competitors like Automatic Differentiation Variational Inference (ADVI) and No U-Turn Sampling (NUTS) in the time required to achieve fixed prediction quality.' volume: 162 URL: https://proceedings.mlr.press/v162/wojnowicz22a.html PDF: https://proceedings.mlr.press/v162/wojnowicz22a/wojnowicz22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wojnowicz22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Michael T family: Wojnowicz - given: Shuchin family: Aeron - given: Eric L family: Miller - given: Michael family: Hughes editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23857-23896 id: wojnowicz22a issued: date-parts: - 2022 - 6 - 28 firstpage: 23857 lastpage: 23896 published: 2022-06-28 00:00:00 +0000 - title: 'Continual Learning with Guarantees via Weight Interval Constraints' abstract: 'We introduce a new training paradigm that enforces interval constraints on neural network parameter space to control forgetting. Contemporary Continual Learning (CL) methods focus on training neural networks efficiently from a stream of data, while reducing the negative impact of catastrophic forgetting, yet they do not provide any firm guarantees that network performance will not deteriorate uncontrollably over time. In this work, we show how to put bounds on forgetting by reformulating continual learning of a model as a continual contraction of its parameter space. To that end, we propose Hyperrectangle Training, a new training methodology where each task is represented by a hyperrectangle in the parameter space, fully contained in the hyperrectangles of the previous tasks. This formulation reduces the NP-hard CL problem back to polynomial time while providing full resilience against forgetting. We validate our claim by developing InterContiNet (Interval Continual Learning) algorithm which leverages interval arithmetic to effectively model parameter regions as hyperrectangles. Through experimental results, we show that our approach performs well in a continual learning setup without storing data from previous tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/wolczyk22a.html PDF: https://proceedings.mlr.press/v162/wolczyk22a/wolczyk22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wolczyk22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Maciej family: Wołczyk - given: Karol family: Piczak - given: Bartosz family: Wójcik - given: Lukasz family: Pustelnik - given: Paweł family: Morawiecki - given: Jacek family: Tabor - given: Tomasz family: Trzcinski - given: Przemysław family: Spurek editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23897-23911 id: wolczyk22a issued: date-parts: - 2022 - 6 - 28 firstpage: 23897 lastpage: 23911 published: 2022-06-28 00:00:00 +0000 - title: 'A Deep Learning Approach for the Segmentation of Electroencephalography Data in Eye Tracking Applications' abstract: 'The collection of eye gaze information provides a window into many critical aspects of human cognition, health and behaviour. Additionally, many neuroscientific studies complement the behavioural information gained from eye tracking with the high temporal resolution and neurophysiological markers provided by electroencephalography (EEG). One of the essential eye-tracking software processing steps is the segmentation of the continuous data stream into events relevant to eye-tracking applications, such as saccades, fixations, and blinks. Here, we introduce DETRtime, a novel framework for time-series segmentation that creates ocular event detectors that do not require additionally recorded eye-tracking modality and rely solely on EEG data. Our end-to-end deep-learning-based framework brings recent advances in Computer Vision to the forefront of the times series segmentation of EEG data. DETRtime achieves state-of-the-art performance in ocular event detection across diverse eye-tracking experiment paradigms. In addition to that, we provide evidence that our model generalizes well in the task of EEG sleep stage segmentation.' volume: 162 URL: https://proceedings.mlr.press/v162/wolf22a.html PDF: https://proceedings.mlr.press/v162/wolf22a/wolf22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wolf22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Lukas family: Wolf - given: Ard family: Kastrati - given: Martyna B family: Plomecka - given: Jie-Ming family: Li - given: Dustin family: Klebe - given: Alexander family: Veicht - given: Roger family: Wattenhofer - given: Nicolas family: Langer editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23912-23932 id: wolf22a issued: date-parts: - 2022 - 6 - 28 firstpage: 23912 lastpage: 23932 published: 2022-06-28 00:00:00 +0000 - title: 'Leverage Score Sampling for Tensor Product Matrices in Input Sparsity Time' abstract: 'We propose an input sparsity time sampling algorithm that can spectrally approximate the Gram matrix corresponding to the q-fold column-wise tensor product of q matrices using a nearly optimal number of samples, improving upon all previously known methods by poly(q) factors. Furthermore, for the important special case of the q-fold self-tensoring of a dataset, which is the feature matrix of the degree-q polynomial kernel, the leading term of our method’s runtime is proportional to the size of the dataset and has no dependence on q. Previous techniques either incur a poly(q) factor slowdown in their runtime or remove the dependence on q at the expense of having sub-optimal target dimension, and depend quadratically on the number of data-points in their runtime. Our sampling technique relies on a collection of q partially correlated random projections which can be simultaneously applied to a dataset X in total time that only depends on the size of X, and at the same time their q-fold Kronecker product acts as a near-isometry for any fixed vector in the column span of $X^{\otimes q}$. We also show that our sampling methods generalize to other classes of kernels beyond polynomial, such as Gaussian and Neural Tangent kernels.' volume: 162 URL: https://proceedings.mlr.press/v162/woodruff22a.html PDF: https://proceedings.mlr.press/v162/woodruff22a/woodruff22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-woodruff22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: David family: Woodruff - given: Amir family: Zandieh editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23933-23964 id: woodruff22a issued: date-parts: - 2022 - 6 - 28 firstpage: 23933 lastpage: 23964 published: 2022-06-28 00:00:00 +0000 - title: 'Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time' abstract: 'The conventional recipe for maximizing model accuracy is to (1) train multiple models with various hyperparameters and (2) pick the individual model which performs best on a held-out validation set, discarding the remainder. In this paper, we revisit the second step of this procedure in the context of fine-tuning large pre-trained models, where fine-tuned models often appear to lie in a single low error basin. We show that averaging the weights of multiple models fine-tuned with different hyperparameter configurations often improves accuracy and robustness. Unlike a conventional ensemble, we may average many models without incurring any additional inference or memory costs—we call the results “model soups.” When fine-tuning large pre-trained models such as CLIP, ALIGN, and a ViT-G pre-trained on JFT, our soup recipe provides significant improvements over the best model in a hyperparameter sweep on ImageNet. The resulting ViT-G model, which attains 90.94% top-1 accuracy on ImageNet, achieved a new state of the art. Furthermore, we show that the model soup approach extends to multiple image classification and natural language processing tasks, improves out-of-distribution performance, and improves zero-shot performance on new downstream tasks. Finally, we analytically relate the performance similarity of weight-averaging and logit-ensembling to flatness of the loss and confidence of the predictions, and validate this relation empirically. Code is available at https://github.com/mlfoundations/model-soups.' volume: 162 URL: https://proceedings.mlr.press/v162/wortsman22a.html PDF: https://proceedings.mlr.press/v162/wortsman22a/wortsman22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wortsman22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mitchell family: Wortsman - given: Gabriel family: Ilharco - given: Samir Ya family: Gadre - given: Rebecca family: Roelofs - given: Raphael family: Gontijo-Lopes - given: Ari S family: Morcos - given: Hongseok family: Namkoong - given: Ali family: Farhadi - given: Yair family: Carmon - given: Simon family: Kornblith - given: Ludwig family: Schmidt editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23965-23998 id: wortsman22a issued: date-parts: - 2022 - 6 - 28 firstpage: 23965 lastpage: 23998 published: 2022-06-28 00:00:00 +0000 - title: 'Metric-Fair Classifier Derandomization' abstract: 'We study the problem of classifier derandomization in machine learning: given a stochastic binary classifier $f: X \to [0,1]$, sample a deterministic classifier $\hat{f}: X \to \{0,1\}$ that approximates the output of $f$ in aggregate over any data distribution. Recent work revealed how to efficiently derandomize a stochastic classifier with strong output approximation guarantees, but at the cost of individual fairness — that is, if $f$ treated similar inputs similarly, $\hat{f}$ did not. In this paper, we initiate a systematic study of classifier derandomization with metric fairness guarantees. We show that the prior derandomization approach is almost maximally metric-unfair, and that a simple “random threshold” derandomization achieves optimal fairness preservation but with weaker output approximation. We then devise a derandomization procedure that provides an appealing tradeoff between these two: if $f$ is $\alpha$-metric fair according to a metric $d$ with a locality-sensitive hash (LSH) family, then our derandomized $\hat{f}$ is, with high probability, $O(\alpha)$-metric fair and a close approximation of $f$. We also prove generic results applicable to all (fair and unfair) classifier derandomization procedures, including a bias-variance decomposition and reductions between various notions of metric fairness.' volume: 162 URL: https://proceedings.mlr.press/v162/wu22a.html PDF: https://proceedings.mlr.press/v162/wu22a/wu22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jimmy family: Wu - given: Yatong family: Chen - given: Yang family: Liu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 23999-24016 id: wu22a issued: date-parts: - 2022 - 6 - 28 firstpage: 23999 lastpage: 24016 published: 2022-06-28 00:00:00 +0000 - title: 'Structural Entropy Guided Graph Hierarchical Pooling' abstract: 'Following the success of convolution on non-Euclidean space, the corresponding pooling approaches have also been validated on various tasks regarding graphs. However, because of the fixed compression ratio and stepwise pooling design, these hierarchical pooling methods still suffer from local structure damage and suboptimal problem. In this work, inspired by structural entropy, we propose a hierarchical pooling approach, SEP, to tackle the two issues. Specifically, without assigning the layer-specific compression ratio, a global optimization algorithm is designed to generate the cluster assignment matrices for pooling at once. Then, we present an illustration of the local structure damage from previous methods in reconstruction of ring and grid synthetic graphs. In addition to SEP, we further design two classification models, SEP-G and SEP-N for graph classification and node classification, respectively. The results show that SEP outperforms state-of-the-art graph pooling methods on graph classification benchmarks and obtains superior performance on node classifications.' volume: 162 URL: https://proceedings.mlr.press/v162/wu22b.html PDF: https://proceedings.mlr.press/v162/wu22b/wu22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wu22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Junran family: Wu - given: Xueyuan family: Chen - given: Ke family: Xu - given: Shangzhe family: Li editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24017-24030 id: wu22b issued: date-parts: - 2022 - 6 - 28 firstpage: 24017 lastpage: 24030 published: 2022-06-28 00:00:00 +0000 - title: 'Self-supervised Models are Good Teaching Assistants for Vision Transformers' abstract: 'Transformers have shown remarkable progress on computer vision tasks in the past year. Compared to their CNN counterparts, transformers usually need the help of distillation to achieve comparable results on middle or small sized datasets. Meanwhile, recent researches discover that when transformers are trained with supervised and self-supervised manner respectively, the captured patterns are quite different both qualitatively and quantitatively. These findings motivate us to introduce an self-supervised teaching assistant (SSTA) besides the commonly used supervised teacher to improve the performance of transformers. Specifically, we propose a head-level knowledge distillation method that selects the most important head of the supervised teacher and self-supervised teaching assistant, and let the student mimic the attention distribution of these two heads, so as to make the student focus on the relationship between tokens deemed by the teacher and the teacher assistant. Extensive experiments verify the effectiveness of SSTA and demonstrate that the proposed SSTA is a good compensation to the supervised teacher. Meanwhile, some analytical experiments towards multiple perspectives (e.g. prediction, shape bias, robustness, and transferability to downstream tasks) with supervised teachers, self-supervised teaching assistants and students are inductive and may inspire future researches.' volume: 162 URL: https://proceedings.mlr.press/v162/wu22c.html PDF: https://proceedings.mlr.press/v162/wu22c/wu22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wu22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Haiyan family: Wu - given: Yuting family: Gao - given: Yinqi family: Zhang - given: Shaohui family: Lin - given: Yuan family: Xie - given: Xing family: Sun - given: Ke family: Li editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24031-24042 id: wu22c issued: date-parts: - 2022 - 6 - 28 firstpage: 24031 lastpage: 24042 published: 2022-06-28 00:00:00 +0000 - title: 'Characterizing and Overcoming the Greedy Nature of Learning in Multi-modal Deep Neural Networks' abstract: 'We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models’ generalization, as we observe empirically. To estimate the model’s dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model’s generalization on three datasets: Colored MNIST, ModelNet40, and NVIDIA Dynamic Hand Gesture.' volume: 162 URL: https://proceedings.mlr.press/v162/wu22d.html PDF: https://proceedings.mlr.press/v162/wu22d/wu22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wu22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Nan family: Wu - given: Stanislaw family: Jastrzebski - given: Kyunghyun family: Cho - given: Krzysztof J family: Geras editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24043-24055 id: wu22d issued: date-parts: - 2022 - 6 - 28 firstpage: 24043 lastpage: 24055 published: 2022-06-28 00:00:00 +0000 - title: 'Instrumental Variable Regression with Confounder Balancing' abstract: 'This paper considers the challenge of estimating treatment effects from observational data in the presence of unmeasured confounders. A popular way to address this challenge is to utilize an instrumental variable (IV) for two-stage regression, i.e., 2SLS and variants, but limited to the linear setting. Recently, many nonlinear IV regression variants were proposed to overcome it by regressing the treatment with IVs and observed confounders in stage 1, leading to the imbalance of the observed confounders in stage 2. In this paper, we propose a Confounder Balanced IV Regression (CB-IV) algorithm to jointly remove the bias from the unmeasured confounders and balance the observed confounders. To the best of our knowledge, this is the first work to combine confounder balancing in IV regression for treatment effect estimation. Theoretically, we re-define and solve the inverse problems for the response-outcome function. Experiments show that our algorithm outperforms the existing approaches.' volume: 162 URL: https://proceedings.mlr.press/v162/wu22e.html PDF: https://proceedings.mlr.press/v162/wu22e/wu22e.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wu22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Anpeng family: Wu - given: Kun family: Kuang - given: Bo family: Li - given: Fei family: Wu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24056-24075 id: wu22e issued: date-parts: - 2022 - 6 - 28 firstpage: 24056 lastpage: 24075 published: 2022-06-28 00:00:00 +0000 - title: 'MemSR: Training Memory-efficient Lightweight Model for Image Super-Resolution' abstract: 'Methods based on deep neural networks with a massive number of layers and skip-connections have made impressive improvements on single image super-resolution (SISR). The skip-connections in these complex models boost the performance at the cost of a large amount of memory. With the increase of camera resolution from 1 million pixels to 100 million pixels on mobile phones, the memory footprint of these algorithms also increases hundreds of times, which restricts the applicability of these models on memory-limited devices. A plain model consisting of a stack of 3{\texttimes}3 convolutions with ReLU, in contrast, has the highest memory efficiency but poorly performs on super-resolution. This paper aims at calculating a winning initialization from a complex teacher network for a plain student network, which can provide performance comparable to complex models. To this end, we convert the teacher model to an equivalent large plain model and derive the plain student’s initialization. We further improve the student’s performance through initialization-aware feature distillation. Extensive experiments suggest that the proposed method results in a model with a competitive trade-off between accuracy and speed at a much lower memory footprint than other state-of-the-art lightweight approaches.' volume: 162 URL: https://proceedings.mlr.press/v162/wu22f.html PDF: https://proceedings.mlr.press/v162/wu22f/wu22f.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wu22f.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Kailu family: Wu - given: Chung-Kuei family: Lee - given: Kaisheng family: Ma editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24076-24092 id: wu22f issued: date-parts: - 2022 - 6 - 28 firstpage: 24076 lastpage: 24092 published: 2022-06-28 00:00:00 +0000 - title: 'Delay-Adaptive Step-sizes for Asynchronous Learning' abstract: 'In scalable machine learning systems, model training is often parallelized over multiple nodes that run without tight synchronization. Most analysis results for the related asynchronous algorithms use an upper bound on the information delays in the system to determine learning rates. Not only are such bounds hard to obtain in advance, but they also result in unnecessarily slow convergence. In this paper, we show that it is possible to use learning rates that depend on the actual time-varying delays in the system. We develop general convergence results for delay-adaptive asynchronous iterations and specialize these to proximal incremental gradient descent and block coordinate descent algorithms. For each of these methods, we demonstrate how delays can be measured on-line, present delay-adaptive step-size policies, and illustrate their theoretical and practical advantages over the state-of-the-art.' volume: 162 URL: https://proceedings.mlr.press/v162/wu22g.html PDF: https://proceedings.mlr.press/v162/wu22g/wu22g.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wu22g.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xuyang family: Wu - given: Sindri family: Magnusson - given: Hamid Reza family: Feyzmahdavian - given: Mikael family: Johansson editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24093-24113 id: wu22g issued: date-parts: - 2022 - 6 - 28 firstpage: 24093 lastpage: 24113 published: 2022-06-28 00:00:00 +0000 - title: 'Variational nearest neighbor Gaussian process' abstract: 'Variational approximations to Gaussian processes (GPs) typically use a small set of inducing points to form a low-rank approximation to the covariance matrix. In this work, we instead exploit a sparse approximation of the precision matrix. We propose variational nearest neighbor Gaussian process (VNNGP), which introduces a prior that only retains correlations within $K$ nearest-neighboring observations, thereby inducing sparse precision structure. Using the variational framework, VNNGP’s objective can be factorized over both observations and inducing points, enabling stochastic optimization with a time complexity of $O(K^3)$. Hence, we can arbitrarily scale the inducing point size, even to the point of putting inducing points at every observed location. We compare VNNGP to other scalable GPs through various experiments, and demonstrate that VNNGP (1) can dramatically outperform low-rank methods, and (2) is less prone to overfitting than other nearest neighbor methods.' volume: 162 URL: https://proceedings.mlr.press/v162/wu22h.html PDF: https://proceedings.mlr.press/v162/wu22h/wu22h.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wu22h.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Luhuan family: Wu - given: Geoff family: Pleiss - given: John P family: Cunningham editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24114-24130 id: wu22h issued: date-parts: - 2022 - 6 - 28 firstpage: 24114 lastpage: 24130 published: 2022-06-28 00:00:00 +0000 - title: 'Understanding Policy Gradient Algorithms: A Sensitivity-Based Approach' abstract: 'The REINFORCE algorithm \cite{williams1992simple} is popular in policy gradient (PG) for solving reinforcement learning (RL) problems. Meanwhile, the theoretical form of PG is from \cite{sutton1999policy}. Although both formulae prescribe PG, their precise connections are not yet illustrated. Recently, \citeauthor{nota2020policy} (\citeyear{nota2020policy}) have found that the ambiguity causes implementation errors. Motivated by the ambiguity and implementation incorrectness, we study PG from a perturbation perspective. In particular, we derive PG in a unified framework, precisely clarify the relation between PG implementation and theory, and echos back the findings by \citeauthor{nota2020policy}. Diving into factors contributing to empirical successes of the existing erroneous implementations, we find that small approximation error and the experience replay mechanism play critical roles.' volume: 162 URL: https://proceedings.mlr.press/v162/wu22i.html PDF: https://proceedings.mlr.press/v162/wu22i/wu22i.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wu22i.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Shuang family: Wu - given: Ling family: Shi - given: Jun family: Wang - given: Guangjian family: Tian editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24131-24149 id: wu22i issued: date-parts: - 2022 - 6 - 28 firstpage: 24131 lastpage: 24149 published: 2022-06-28 00:00:00 +0000 - title: 'DAVINZ: Data Valuation using Deep Neural Networks at Initialization' abstract: 'Recent years have witnessed a surge of interest in developing trustworthy methods to evaluate the value of data in many real-world applications (e.g., collaborative machine learning, data marketplaces). Existing data valuation methods typically valuate data using the generalization performance of converged machine learning models after their long-term model training, hence making data valuation on large complex deep neural networks (DNNs) unaffordable. To this end, we theoretically derive a domain-aware generalization bound to estimate the generalization performance of DNNs without model training. We then exploit this theoretically derived generalization bound to develop a novel training-free data valuation method named data valuation at initialization (DAVINZ) on DNNs, which consistently achieves remarkable effectiveness and efficiency in practice. Moreover, our training-free DAVINZ, surprisingly, can even theoretically and empirically enjoy the desirable properties that training-based data valuation methods usually attain, thus making it more trustworthy in practice.' volume: 162 URL: https://proceedings.mlr.press/v162/wu22j.html PDF: https://proceedings.mlr.press/v162/wu22j/wu22j.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wu22j.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhaoxuan family: Wu - given: Yao family: Shu - given: Bryan Kian Hsiang family: Low editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24150-24176 id: wu22j issued: date-parts: - 2022 - 6 - 28 firstpage: 24150 lastpage: 24176 published: 2022-06-28 00:00:00 +0000 - title: 'Robust Deep Reinforcement Learning through Bootstrapped Opportunistic Curriculum' abstract: 'Despite considerable advances in deep reinforcement learning, it has been shown to be highly vulnerable to adversarial perturbations to state observations. Recent efforts that have attempted to improve adversarial robustness of reinforcement learning can nevertheless tolerate only very small perturbations, and remain fragile as perturbation size increases. We propose Bootstrapped Opportunistic Adversarial Curriculum Learning (BCL), a novel flexible adversarial curriculum learning framework for robust reinforcement learning. Our framework combines two ideas: conservatively bootstrapping each curriculum phase with highest quality solutions obtained from multiple runs of the previous phase, and opportunistically skipping forward in the curriculum. In our experiments we show that the proposed BCL framework enables dramatic improvements in robustness of learned policies to adversarial perturbations. The greatest improvement is for Pong, where our framework yields robustness to perturbations of up to 25/255; in contrast, the best existing approach can only tolerate adversarial noise up to 5/255. Our code is available at: https://github.com/jlwu002/BCL.' volume: 162 URL: https://proceedings.mlr.press/v162/wu22k.html PDF: https://proceedings.mlr.press/v162/wu22k/wu22k.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wu22k.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Junlin family: Wu - given: Yevgeniy family: Vorobeychik editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24177-24211 id: wu22k issued: date-parts: - 2022 - 6 - 28 firstpage: 24177 lastpage: 24211 published: 2022-06-28 00:00:00 +0000 - title: 'Revisiting Consistency Regularization for Deep Partial Label Learning' abstract: 'Partial label learning (PLL), which refers to the classification task where each training instance is ambiguously annotated with a set of candidate labels, has been recently studied in deep learning paradigm. Despite advances in recent deep PLL literature, existing methods (e.g., methods based on self-training or contrastive learning) are confronted with either ineffectiveness or inefficiency. In this paper, we revisit a simple idea namely consistency regularization, which has been shown effective in traditional PLL literature, to guide the training of deep models. Towards this goal, a new regularized training framework, which performs supervised learning on non-candidate labels and employs consistency regularization on candidate labels, is proposed for PLL. We instantiate the regularization term by matching the outputs of multiple augmentations of an instance to a conformal label distribution, which can be adaptively inferred by the closed-form solution. Experiments on benchmark datasets demonstrate the superiority of the proposed method compared with other state-of-the-art methods.' volume: 162 URL: https://proceedings.mlr.press/v162/wu22l.html PDF: https://proceedings.mlr.press/v162/wu22l/wu22l.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wu22l.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Dong-Dong family: Wu - given: Deng-Bao family: Wang - given: Min-Ling family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24212-24225 id: wu22l issued: date-parts: - 2022 - 6 - 28 firstpage: 24212 lastpage: 24225 published: 2022-06-28 00:00:00 +0000 - title: 'Flowformer: Linearizing Transformers with Conservation Flows' abstract: 'Transformers based on the attention mechanism have achieved impressive success in various areas. However, the attention mechanism has a quadratic complexity, significantly impeding Transformers from dealing with numerous tokens and scaling up to bigger models. Previous methods mainly utilize the similarity decomposition and the associativity of matrix multiplication to devise linear-time attention mechanisms. They avoid degeneration of attention to a trivial distribution by reintroducing inductive biases such as the locality, thereby at the expense of model generality and expressiveness. In this paper, we linearize Transformers free from specific inductive biases based on the flow network theory. We cast attention as the information flow aggregated from the sources (values) to the sinks (results) through the learned flow capacities (attentions). Within this framework, we apply the property of flow conservation into attention and propose the Flow-Attention mechanism of linear complexity. By respectively conserving the incoming flow of sinks for source competition and the outgoing flow of sources for sink allocation, Flow-Attention inherently generates informative attentions without using specific inductive biases. Empowered by the Flow-Attention, Flowformer yields strong performance in linear time for wide areas, including long sequence, time series, vision, natural language, and reinforcement learning. The code and settings are available at this repository: https://github.com/thuml/Flowformer.' volume: 162 URL: https://proceedings.mlr.press/v162/wu22m.html PDF: https://proceedings.mlr.press/v162/wu22m/wu22m.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wu22m.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Haixu family: Wu - given: Jialong family: Wu - given: Jiehui family: Xu - given: Jianmin family: Wang - given: Mingsheng family: Long editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24226-24242 id: wu22m issued: date-parts: - 2022 - 6 - 28 firstpage: 24226 lastpage: 24242 published: 2022-06-28 00:00:00 +0000 - title: 'Nearly Optimal Policy Optimization with Stable at Any Time Guarantee' abstract: 'Policy optimization methods are one of the most widely used classes of Reinforcement Learning (RL) algorithms. However, theoretical understanding of these methods remains insufficient. Even in the episodic (time-inhomogeneous) tabular setting, the state-of-the-art theoretical result of policy-based method in Shani et al. (2020) is only $\tilde{O}(\sqrt{S^2AH^4K})$ where $S$ is the number of states, $A$ is the number of actions, $H$ is the horizon, and $K$ is the number of episodes, and there is a $\sqrt{SH}$ gap compared with the information theoretic lower bound $\tilde{\Omega}(\sqrt{SAH^3K})$ (Jin et al., 2018). To bridge such a gap, we propose a novel algorithm Reference-based Policy Optimization with Stable at Any Time guarantee (RPO-SAT), which features the property “Stable at Any Time”. We prove that our algorithm achieves $\tilde{O}(\sqrt{SAH^3K} + \sqrt{AH^4K})$ regret. When $S > H$, our algorithm is minimax optimal when ignoring logarithmic factors. To our best knowledge, RPO-SAT is the first computationally efficient, nearly minimax optimal policy-based algorithm for tabular RL.' volume: 162 URL: https://proceedings.mlr.press/v162/wu22n.html PDF: https://proceedings.mlr.press/v162/wu22n/wu22n.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wu22n.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tianhao family: Wu - given: Yunchang family: Yang - given: Han family: Zhong - given: Liwei family: Wang - given: Simon family: Du - given: Jiantao family: Jiao editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24243-24265 id: wu22n issued: date-parts: - 2022 - 6 - 28 firstpage: 24243 lastpage: 24265 published: 2022-06-28 00:00:00 +0000 - title: 'RetrievalGuard: Provably Robust 1-Nearest Neighbor Image Retrieval' abstract: 'Recent research works have shown that image retrieval models are vulnerable to adversarial attacks, where slightly modified test inputs could lead to problematic retrieval results. In this paper, we aim to design a provably robust image retrieval model which keeps the most important evaluation metric Recall@1 invariant to adversarial perturbation. We propose the first 1-nearest neighbor (NN) image retrieval algorithm, RetrievalGuard, which is provably robust against adversarial perturbations within an $\ell_2$ ball of calculable radius. The challenge is to design a provably robust algorithm that takes into consideration the 1-NN search and the high-dimensional nature of the embedding space. Algorithmically, given a base retrieval model and a query sample, we build a smoothed retrieval model by carefully analyzing the 1-NN search procedure in the high-dimensional embedding space. We show that the smoothed retrieval model has bounded Lipschitz constant and thus the retrieval score is invariant to $\ell_2$ adversarial perturbations. Experiments on on image retrieval tasks validate the robustness of our RetrievalGuard method.' volume: 162 URL: https://proceedings.mlr.press/v162/wu22o.html PDF: https://proceedings.mlr.press/v162/wu22o/wu22o.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wu22o.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yihan family: Wu - given: Hongyang family: Zhang - given: Heng family: Huang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24266-24279 id: wu22o issued: date-parts: - 2022 - 6 - 28 firstpage: 24266 lastpage: 24279 published: 2022-06-28 00:00:00 +0000 - title: 'Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression' abstract: 'Stochastic gradient descent (SGD) has been shown to generalize well in many deep learning applications. In practice, one often runs SGD with a geometrically decaying stepsize, i.e., a constant initial stepsize followed by multiple geometric stepsize decay, and uses the last iterate as the output. This kind of SGD is known to be nearly minimax optimal for classical finite-dimensional linear regression problems (Ge et al., 2019). However, a sharp analysis for the last iterate of SGD in the overparameterized setting is still open. In this paper, we provide a problem-dependent analysis on the last iterate risk bounds of SGD with decaying stepsize, for (overparameterized) linear regression problems. In particular, for last iterate SGD with (tail) geometrically decaying stepsize, we prove nearly matching upper and lower bounds on the excess risk. Moreover, we provide an excess risk lower bound for last iterate SGD with polynomially decaying stepsize and demonstrate the advantage of geometrically decaying stepsize in an instance-wise manner, which complements the minimax rate comparison made in prior work.' volume: 162 URL: https://proceedings.mlr.press/v162/wu22p.html PDF: https://proceedings.mlr.press/v162/wu22p/wu22p.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-wu22p.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jingfeng family: Wu - given: Difan family: Zou - given: Vladimir family: Braverman - given: Quanquan family: Gu - given: Sham family: Kakade editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24280-24314 id: wu22p issued: date-parts: - 2022 - 6 - 28 firstpage: 24280 lastpage: 24314 published: 2022-06-28 00:00:00 +0000 - title: 'Optimal Clustering with Noisy Queries via Multi-Armed Bandit' abstract: 'Motivated by many applications, we study clustering with a faulty oracle. In this problem, there are $n$ items belonging to $k$ unknown clusters, and the algorithm is allowed to ask the oracle whether two items belong to the same cluster or not. However, the answer from the oracle is correct only with probability $\frac{1}{2}+\frac{\delta}{2}$. The goal is to recover the hidden clusters with minimum number of noisy queries. Previous works have shown that the problem can be solved with $O(\frac{nk\log n}{\delta^2} + \text{poly}(k,\frac{1}{\delta}, \log n))$ queries, while $\Omega(\frac{nk}{\delta^2})$ queries is known to be necessary. So, for any values of $k$ and $\delta$, there is still a non-trivial gap between upper and lower bounds. In this work, we obtain the first matching upper and lower bounds for a wide range of parameters. In particular, a new polynomial time algorithm with $O(\frac{n(k+\log n)}{\delta^2} + \text{poly}(k,\frac{1}{\delta}, \log n))$ queries is proposed. Moreover, we prove a new lower bound of $\Omega(\frac{n\log n}{\delta^2})$, which, combined with the existing $\Omega(\frac{nk}{\delta^2})$ bound, matches our upper bound up to an additive $\text{poly}(k,\frac{1}{\delta},\log n)$ term. To obtain the new results, our main ingredient is an interesting connection between our problem and multi-armed bandit, which might provide useful insights for other similar problems.' volume: 162 URL: https://proceedings.mlr.press/v162/xia22a.html PDF: https://proceedings.mlr.press/v162/xia22a/xia22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-xia22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jinghui family: Xia - given: Zengfeng family: Huang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24315-24331 id: xia22a issued: date-parts: - 2022 - 6 - 28 firstpage: 24315 lastpage: 24331 published: 2022-06-28 00:00:00 +0000 - title: 'ProGCL: Rethinking Hard Negative Mining in Graph Contrastive Learning' abstract: 'Contrastive Learning (CL) has emerged as a dominant technique for unsupervised representation learning which embeds augmented versions of the anchor close to each other (positive samples) and pushes the embeddings of other samples (negatives) apart. As revealed in recent studies, CL can benefit from hard negatives (negatives that are most similar to the anchor). However, we observe limited benefits when we adopt existing hard negative mining techniques of other domains in Graph Contrastive Learning (GCL). We perform both experimental and theoretical analysis on this phenomenon and find it can be attributed to the message passing of Graph Neural Networks (GNNs). Unlike CL in other domains, most hard negatives are potentially false negatives (negatives that share the same class with the anchor) if they are selected merely according to the similarities between anchor and themselves, which will undesirably push away the samples of the same class. To remedy this deficiency, we propose an effective method, dubbed \textbf{ProGCL}, to estimate the probability of a negative being true one, which constitutes a more suitable measure for negatives’ hardness together with similarity. Additionally, we devise two schemes (i.e., \textbf{ProGCL-weight} and \textbf{ProGCL-mix}) to boost the performance of GCL. Extensive experiments demonstrate that ProGCL brings notable and consistent improvements over base GCL methods and yields multiple state-of-the-art results on several unsupervised benchmarks or even exceeds the performance of supervised ones. Also, ProGCL is readily pluggable into various negatives-based GCL methods for performance improvement. We release the code at \textcolor{magenta}\url{https://github.com/junxia97/ProGCL}.' volume: 162 URL: https://proceedings.mlr.press/v162/xia22b.html PDF: https://proceedings.mlr.press/v162/xia22b/xia22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-xia22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jun family: Xia - given: Lirong family: Wu - given: Ge family: Wang - given: Jintao family: Chen - given: Stan Z. family: Li editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24332-24346 id: xia22b issued: date-parts: - 2022 - 6 - 28 firstpage: 24332 lastpage: 24346 published: 2022-06-28 00:00:00 +0000 - title: 'Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm' abstract: 'Although learning in high dimensions is commonly believed to suffer from the curse of dimensionality, modern machine learning methods often exhibit an astonishing power to tackle a wide range of challenging real-world learning problems without using abundant amounts of data. How exactly these methods break this curse remains a fundamental open question in the theory of deep learning. While previous efforts have investigated this question by studying the data ($\mathcal D$), model ($\mathcal M$), and inference algorithm ($\mathcal I$) as independent modules, in this paper we analyzes the triplet $(\mathcal D, \mathcal M, \mathcal I)$ as an integrated system and identify important synergies that help mitigate the curse of dimensionality. We first study the basic symmetries associated with various learning algorithms ($\mathcal M, \mathcal I$), focusing on four prototypical architectures in deep learning: fully-connected networks, locally-connected networks, and convolutional networks with and without pooling. We find that learning is most efficient when these symmetries are compatible with those of the data distribution and that performance significantly deteriorates when any member of the \dmi triplet is inconsistent or suboptimal.' volume: 162 URL: https://proceedings.mlr.press/v162/xiao22a.html PDF: https://proceedings.mlr.press/v162/xiao22a/xiao22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-xiao22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Lechao family: Xiao - given: Jeffrey family: Pennington editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24347-24369 id: xiao22a issued: date-parts: - 2022 - 6 - 28 firstpage: 24347 lastpage: 24369 published: 2022-06-28 00:00:00 +0000 - title: 'Identification of Linear Non-Gaussian Latent Hierarchical Structure' abstract: 'Traditional causal discovery methods mainly focus on estimating causal relations among measured variables, but in many real-world problems, such as questionnaire-based psychometric studies, measured variables are generated by latent variables that are causally related. Accordingly, this paper investigates the problem of discovering the hidden causal variables and estimating the causal structure, including both the causal relations among latent variables and those between latent and measured variables. We relax the frequently-used measurement assumption and allow the children of latent variables to be latent as well, and hence deal with a specific type of latent hierarchical causal structure. In particular, we define a minimal latent hierarchical structure and show that for linear non-Gaussian models with the minimal latent hierarchical structure, the whole structure is identifiable from only the measured variables. Moreover, we develop a principled method to identify the structure by testing for Generalized Independent Noise (GIN) conditions in specific ways. Experimental results on both synthetic and real-world data show the effectiveness of the proposed approach.' volume: 162 URL: https://proceedings.mlr.press/v162/xie22a.html PDF: https://proceedings.mlr.press/v162/xie22a/xie22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-xie22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Feng family: Xie - given: Biwei family: Huang - given: Zhengming family: Chen - given: Yangbo family: He - given: Zhi family: Geng - given: Kun family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24370-24387 id: xie22a issued: date-parts: - 2022 - 6 - 28 firstpage: 24370 lastpage: 24387 published: 2022-06-28 00:00:00 +0000 - title: 'COAT: Measuring Object Compositionality in Emergent Representations' abstract: 'Learning representations that can decompose a multi-object scene into its constituent objects and recompose them flexibly is desirable for object-oriented reasoning and planning. Built upon object masks in the pixel space, existing metrics for objectness can only evaluate generative models with an object-specific “slot” structure. We propose to directly measure compositionality in the representation space as a form of objections, making such evaluations tractable for a wider class of models. Our metric, COAT (Compositional Object Algebra Test), evaluates if a generic representation exhibits certain geometric properties that underpin object compositionality beyond what is already captured by the raw pixel space. Our experiments on the popular CLEVR (Johnson et.al., 2018) domain reveal that existing disentanglement-based generative models are not as compositional as one might expect, suggesting room for further modeling improvements. We hope our work allows for a unified evaluation of object-centric representations, spanning generative as well as discriminative, self-supervised models.' volume: 162 URL: https://proceedings.mlr.press/v162/xie22b.html PDF: https://proceedings.mlr.press/v162/xie22b/xie22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-xie22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sirui family: Xie - given: Ari S family: Morcos - given: Song-Chun family: Zhu - given: Ramakrishna family: Vedantam editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24388-24413 id: xie22b issued: date-parts: - 2022 - 6 - 28 firstpage: 24388 lastpage: 24413 published: 2022-06-28 00:00:00 +0000 - title: 'Robust Policy Learning over Multiple Uncertainty Sets' abstract: 'Reinforcement learning (RL) agents need to be robust to variations in safety-critical environments. While system identification methods provide a way to infer the variation from online experience, they can fail in settings where fast identification is not possible. Another dominant approach is robust RL which produces a policy that can handle worst-case scenarios, but these methods are generally designed to achieve robustness to a single uncertainty set that must be specified at train time. Towards a more general solution, we formulate the multi-set robustness problem to learn a policy robust to different perturbation sets. We then design an algorithm that enjoys the benefits of both system identification and robust RL: it reduces uncertainty where possible given a few interactions, but can still act robustly with respect to the remaining uncertainty. On a diverse set of control tasks, our approach demonstrates improved worst-case performance on new environments compared to prior methods based on system identification and on robust RL alone.' volume: 162 URL: https://proceedings.mlr.press/v162/xie22c.html PDF: https://proceedings.mlr.press/v162/xie22c/xie22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-xie22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Annie family: Xie - given: Shagun family: Sodhani - given: Chelsea family: Finn - given: Joelle family: Pineau - given: Amy family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24414-24429 id: xie22c issued: date-parts: - 2022 - 6 - 28 firstpage: 24414 lastpage: 24429 published: 2022-06-28 00:00:00 +0000 - title: 'Adaptive Inertia: Disentangling the Effects of Adaptive Learning Rate and Momentum' abstract: 'Adaptive Moment Estimation (Adam), which combines Adaptive Learning Rate and Momentum, would be the most popular stochastic optimizer for accelerating the training of deep neural networks. However, it is empirically known that Adam often generalizes worse than Stochastic Gradient Descent (SGD). The purpose of this paper is to unveil the mystery of this behavior in the diffusion theoretical framework. Specifically, we disentangle the effects of Adaptive Learning Rate and Momentum of the Adam dynamics on saddle-point escaping and flat minima selection. We prove that Adaptive Learning Rate can escape saddle points efficiently, but cannot select flat minima as SGD does. In contrast, Momentum provides a drift effect to help the training process pass through saddle points, and almost does not affect flat minima selection. This partly explains why SGD (with Momentum) generalizes better, while Adam generalizes worse but converges faster. Furthermore, motivated by the analysis, we design a novel adaptive optimization framework named Adaptive Inertia, which uses parameter-wise adaptive inertia to accelerate the training and provably favors flat minima as well as SGD. Our extensive experiments demonstrate that the proposed adaptive inertia method can generalize significantly better than SGD and conventional adaptive gradient methods.' volume: 162 URL: https://proceedings.mlr.press/v162/xie22d.html PDF: https://proceedings.mlr.press/v162/xie22d/xie22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-xie22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zeke family: Xie - given: Xinrui family: Wang - given: Huishuai family: Zhang - given: Issei family: Sato - given: Masashi family: Sugiyama editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24430-24459 id: xie22d issued: date-parts: - 2022 - 6 - 28 firstpage: 24430 lastpage: 24459 published: 2022-06-28 00:00:00 +0000 - title: 'Self-Supervised Representation Learning via Latent Graph Prediction' abstract: 'Self-supervised learning (SSL) of graph neural networks is emerging as a promising way of leveraging unlabeled data. Currently, most methods are based on contrastive learning adapted from the image domain, which requires view generation and a sufficient number of negative samples. In contrast, existing predictive models do not require negative sampling, but lack theoretical guidance on the design of pretext training tasks. In this work, we propose the LaGraph, a theoretically grounded predictive SSL framework based on latent graph prediction. Learning objectives of LaGraph are derived as self-supervised upper bounds to objectives for predicting unobserved latent graphs. In addition to its improved performance, LaGraph provides explanations for recent successes of predictive models that include invariance-based objectives. We provide theoretical analysis comparing LaGraph to related methods in different domains. Our experimental results demonstrate the superiority of LaGraph in performance and the robustness to decreasing of training sample size on both graph-level and node-level tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/xie22e.html PDF: https://proceedings.mlr.press/v162/xie22e/xie22e.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-xie22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yaochen family: Xie - given: Zhao family: Xu - given: Shuiwang family: Ji editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24460-24477 id: xie22e issued: date-parts: - 2022 - 6 - 28 firstpage: 24460 lastpage: 24477 published: 2022-06-28 00:00:00 +0000 - title: 'Efficient Computation of Higher-Order Subgraph Attribution via Message Passing' abstract: 'Explaining graph neural networks (GNNs) has become more and more important recently. Higher-order interpretation schemes, such as GNN-LRP (layer-wise relevance propagation for GNN), emerged as powerful tools for unraveling how different features interact thereby contributing to explaining GNNs. GNN-LRP gives a relevance attribution of walks between nodes at each layer, and the subgraph attribution is expressed as a sum over exponentially many such walks. In this work, we demonstrate that such exponential complexity can be avoided. In particular, we propose novel algorithms that enable to attribute subgraphs with GNN-LRP in linear-time (w.r.t. the network depth). Our algorithms are derived via message passing techniques that make use of the distributive property, thereby directly computing quantities for higher-order explanations. We further adapt our efficient algorithms to compute a generalization of subgraph attributions that also takes into account the neighboring graph features. Experimental results show the significant acceleration of the proposed algorithms and demonstrate the high usefulness and scalability of our novel generalized subgraph attribution method.' volume: 162 URL: https://proceedings.mlr.press/v162/xiong22a.html PDF: https://proceedings.mlr.press/v162/xiong22a/xiong22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-xiong22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ping family: Xiong - given: Thomas family: Schnake - given: Grégoire family: Montavon - given: Klaus-Robert family: Müller - given: Shinichi family: Nakajima editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24478-24495 id: xiong22a issued: date-parts: - 2022 - 6 - 28 firstpage: 24478 lastpage: 24495 published: 2022-06-28 00:00:00 +0000 - title: 'A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games' abstract: 'Existing studies on provably efficient algorithms for Markov games (MGs) almost exclusively build on the “optimism in the face of uncertainty” (OFU) principle. This work focuses on a distinct approach of posterior sampling, which is celebrated in many bandits and reinforcement learning settings but remains under-explored for MGs. Specifically, for episodic two-player zero-sum MGs, a novel posterior sampling algorithm is developed with general function approximation. Theoretical analysis demonstrates that the posterior sampling algorithm admits a $\sqrt{T}$-regret bound for problems with a low multi-agent decoupling coefficient, which is a new complexity measure for MGs, where $T$ denotes the number of episodes. When specializing to linear MGs, the obtained regret bound matches the state-of-the-art results. To the best of our knowledge, this is the first provably efficient posterior sampling algorithm for MGs with frequentist regret guarantees, which extends the toolbox for MGs and promotes the broad applicability of posterior sampling.' volume: 162 URL: https://proceedings.mlr.press/v162/xiong22b.html PDF: https://proceedings.mlr.press/v162/xiong22b/xiong22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-xiong22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Wei family: Xiong - given: Han family: Zhong - given: Chengshuai family: Shi - given: Cong family: Shen - given: Tong family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24496-24523 id: xiong22b issued: date-parts: - 2022 - 6 - 28 firstpage: 24496 lastpage: 24523 published: 2022-06-28 00:00:00 +0000 - title: 'Importance Weighted Kernel Bayes’ Rule' abstract: 'We study a nonparametric approach to Bayesian computation via feature means, where the expectation of prior features is updated to yield expected posterior features, based on regression from kernel or neural net features of the observations. All quantities involved in the Bayesian update are learned from observed data, making the method entirely model-free. The resulting algorithm is a novel instance of a kernel Bayes’ rule (KBR). Our approach is based on importance weighting, which results in superior numerical stability to the existing approach to KBR, which requires operator inversion. We show the convergence of the estimator using a novel consistency analysis on the importance weighting estimator in the infinity norm. We evaluate our KBR on challenging synthetic benchmarks, including a filtering problem with a state-space model involving high dimensional image observations. The proposed method yields uniformly better empirical performance than the existing KBR, and competitive performance with other competing methods. We evaluate our KBR on challenging synthetic benchmarks, including a filtering problem with a state-space model involving high dimensional image observations. The proposed method yields uniformly better empirical performance than the existing KBR, and competitive performance with other competing methods.' volume: 162 URL: https://proceedings.mlr.press/v162/xu22a.html PDF: https://proceedings.mlr.press/v162/xu22a/xu22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-xu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Liyuan family: Xu - given: Yutian family: Chen - given: Arnaud family: Doucet - given: Arthur family: Gretton editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24524-24538 id: xu22a issued: date-parts: - 2022 - 6 - 28 firstpage: 24524 lastpage: 24538 published: 2022-06-28 00:00:00 +0000 - title: 'Learning to Separate Voices by Spatial Regions' abstract: 'We consider the problem of audio voice separation for binaural applications, such as earphones and hearing aids. While today’s neural networks perform remarkably well (separating 4+ sources with 2 microphones) they assume a known or fixed maximum number of sources, K. Moreover, today’s models are trained in a supervised manner, using training data synthesized from generic sources, environments, and human head shapes. This paper intends to relax both these constraints at the expense of a slight alteration in the problem definition. We observe that, when a received mixture contains too many sources, it is still helpful to separate them by region, i.e., isolating signal mixtures from each conical sector around the user’s head. This requires learning the fine-grained spatial properties of each region, including the signal distortions imposed by a person’s head. We propose a two-stage self-supervised framework in which overheard voices from earphones are pre-processed to extract relatively clean personalized signals, which are then used to train a region-wise separation model. Results show promising performance, underscoring the importance of personalization over a generic supervised approach. (audio samples available at our project website: https://uiuc-earable-computing.github.io/binaural). We believe this result could help real-world applications in selective hearing, noise cancellation, and audio augmented reality.' volume: 162 URL: https://proceedings.mlr.press/v162/xu22b.html PDF: https://proceedings.mlr.press/v162/xu22b/xu22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-xu22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Alan family: Xu - given: Romit Roy family: Choudhury editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24539-24549 id: xu22b issued: date-parts: - 2022 - 6 - 28 firstpage: 24539 lastpage: 24549 published: 2022-06-28 00:00:00 +0000 - title: 'Detached Error Feedback for Distributed SGD with Random Sparsification' abstract: 'The communication bottleneck has been a critical problem in large-scale distributed deep learning. In this work, we study distributed SGD with random block-wise sparsification as the gradient compressor, which is ring-allreduce compatible and highly computation-efficient but leads to inferior performance. To tackle this important issue, we improve the communication-efficient distributed SGD from a novel aspect, that is, the trade-off between the variance and second moment of the gradient. With this motivation, we propose a new detached error feedback (DEF) algorithm, which shows better convergence bound than error feedback for non-convex problems. We also propose DEF-A to accelerate the generalization of DEF at the early stages of the training, which shows better generalization bounds than DEF. Furthermore, we establish the connection between communication-efficient distributed SGD and SGD with iterate averaging (SGD-IA) for the first time. Extensive deep learning experiments show significant empirical improvement of the proposed methods under various settings. Our reproducible codes and scripts for all experiments in this work will be made publicly available.' volume: 162 URL: https://proceedings.mlr.press/v162/xu22c.html PDF: https://proceedings.mlr.press/v162/xu22c/xu22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-xu22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: An family: Xu - given: Heng family: Huang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24550-24575 id: xu22c issued: date-parts: - 2022 - 6 - 28 firstpage: 24550 lastpage: 24575 published: 2022-06-28 00:00:00 +0000 - title: 'Accurate Quantization of Measures via Interacting Particle-based Optimization' abstract: 'Approximating a target probability distribution can be cast as an optimization problem where the objective functional measures the dissimilarity to the target. This optimization can be addressed by approximating Wasserstein and related gradient flows. In practice, these are simulated by interacting particle systems, whose stationary states define an empirical measure approximating the target distribution. This approach has been popularized recently to design sampling algorithms, e.g. Stein Variational Gradient Descent, or by minimizing the Maximum Mean or Kernel Stein Discrepancy. However, little is known about quantization properties of these approaches, i.e. how well is the target approximated by a finite number particles. We investigate this question theoretically and numerically. In particular, we prove general upper bounds on the quantization error of MMD and KSD at rates which significantly outperform quantization by i.i.d. samples. We conduct experiments which show that the particle systems at study achieve fast rates in practice, and notably outperform greedy algorithms, such as kernel herding. We compare different gradient flows and highlight their quantization rates. Furthermore we introduce a Normalized Stein Variational Gradient Descent and argue in favor of adaptive kernels, which exhibit faster convergence. Finally we compare the Gaussian and Laplace kernels and argue that the Laplace kernel provides a more robust quantization.' volume: 162 URL: https://proceedings.mlr.press/v162/xu22d.html PDF: https://proceedings.mlr.press/v162/xu22d/xu22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-xu22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Lantian family: Xu - given: Anna family: Korba - given: Dejan family: Slepcev editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24576-24595 id: xu22d issued: date-parts: - 2022 - 6 - 28 firstpage: 24576 lastpage: 24595 published: 2022-06-28 00:00:00 +0000 - title: 'Unified Fourier-based Kernel and Nonlinearity Design for Equivariant Networks on Homogeneous Spaces' abstract: 'We introduce a unified framework for group equivariant networks on homogeneous spaces derived from a Fourier perspective. We consider tensor-valued feature fields, before and after a convolutional layer. We present a unified derivation of kernels via the Fourier domain by leveraging the sparsity of Fourier coefficients of the lifted feature fields. The sparsity emerges when the stabilizer subgroup of the homogeneous space is a compact Lie group. We further introduce a nonlinear activation, via an elementwise nonlinearity on the regular representation after lifting and projecting back to the field through an equivariant convolution. We show that other methods treating features as the Fourier coefficients in the stabilizer subgroup are special cases of our activation. Experiments on $SO(3)$ and $SE(3)$ show state-of-the-art performance in spherical vector field regression, point cloud classification, and molecular completion.' volume: 162 URL: https://proceedings.mlr.press/v162/xu22e.html PDF: https://proceedings.mlr.press/v162/xu22e/xu22e.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-xu22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yinshuang family: Xu - given: Jiahui family: Lei - given: Edgar family: Dobriban - given: Kostas family: Daniilidis editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24596-24614 id: xu22e issued: date-parts: - 2022 - 6 - 28 firstpage: 24596 lastpage: 24614 published: 2022-06-28 00:00:00 +0000 - title: 'Inferring Cause and Effect in the Presence of Heteroscedastic Noise' abstract: 'We study the problem of identifying cause and effect over two univariate continuous variables $X$ and $Y$ from a sample of their joint distribution. Our focus lies on the setting when the variance of the noise may be dependent on the cause. We propose to partition the domain of the cause into multiple segments where the noise indeed is dependent. To this end, we minimize a scale-invariant, penalized regression score, finding the optimal partitioning using dynamic programming. We show under which conditions this allows us to identify the causal direction for the linear setting with heteroscedastic noise, for the non-linear setting with homoscedastic noise, as well as empirically confirm that these results generalize to the non-linear and heteroscedastic case. Altogether, the ability to model heteroscedasticity translates into an improved performance in telling cause from effect on a wide range of synthetic and real-world datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/xu22f.html PDF: https://proceedings.mlr.press/v162/xu22f/xu22f.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-xu22f.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sascha family: Xu - given: Osman A family: Mian - given: Alexander family: Marx - given: Jilles family: Vreeken editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24615-24630 id: xu22f issued: date-parts: - 2022 - 6 - 28 firstpage: 24615 lastpage: 24630 published: 2022-06-28 00:00:00 +0000 - title: 'Prompting Decision Transformer for Few-Shot Policy Generalization' abstract: 'Human can leverage prior experience and learn novel tasks from a handful of demonstrations. In contrast to offline meta-reinforcement learning, which aims to achieve quick adaptation through better algorithm design, we investigate the effect of architecture inductive bias on the few-shot learning capability. We propose a Prompt-based Decision Transformer (Prompt-DT), which leverages the sequential modeling ability of the Transformer architecture and the prompt framework to achieve few-shot adaptation in offline RL. We design the trajectory prompt, which contains segments of the few-shot demonstrations, and encodes task-specific information to guide policy generation. Our experiments in five MuJoCo control benchmarks show that Prompt-DT is a strong few-shot learner without any extra finetuning on unseen target tasks. Prompt-DT outperforms its variants and strong meta offline RL baselines by a large margin with a trajectory prompt containing only a few timesteps. Prompt-DT is also robust to prompt length changes and can generalize to out-of-distribution (OOD) environments. Project page: \href{https://mxu34.github.io/PromptDT/}{https://mxu34.github.io/PromptDT/}.' volume: 162 URL: https://proceedings.mlr.press/v162/xu22g.html PDF: https://proceedings.mlr.press/v162/xu22g/xu22g.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-xu22g.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mengdi family: Xu - given: Yikang family: Shen - given: Shun family: Zhang - given: Yuchen family: Lu - given: Ding family: Zhao - given: Joshua family: Tenenbaum - given: Chuang family: Gan editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24631-24645 id: xu22g issued: date-parts: - 2022 - 6 - 28 firstpage: 24631 lastpage: 24645 published: 2022-06-28 00:00:00 +0000 - title: 'Analyzing and Mitigating Interference in Neural Architecture Search' abstract: 'Weight sharing is a popular approach to reduce the training cost of neural architecture search (NAS) by reusing the weights of shared operators from previously trained child models. However, the rank correlation between the estimated accuracy and ground truth accuracy of those child models is low due to the interference among different child models caused by weight sharing. In this paper, we investigate the interference issue by sampling different child models and calculating the gradient similarity of shared operators, and observe that: 1) the interference on a shared operator between two child models is positively correlated with the number of different operators between them; 2) the interference is smaller when the inputs and outputs of the shared operator are more similar. Inspired by these two observations, we propose two approaches to mitigate the interference: 1) rather than randomly sampling child models for optimization, we propose a gradual modification scheme by modifying one operator between adjacent optimization steps to minimize the interference on the shared operators; 2) forcing the inputs and outputs of the operator across all child models to be similar to reduce the interference. Experiments on a BERT search space verify that mitigating interference via each of our proposed methods improves the rank correlation of super-pet and combining both methods can achieve better results. Our discovered architecture outperforms RoBERTa$_{\rm base}$ by 1.1 and 0.6 points and ELECTRA$_{\rm base}$ by 1.6 and 1.1 points on the dev and test set of GLUE benchmark. Extensive results on the BERT compression, reading comprehension and large-scale image classification tasks also demonstrate the effectiveness and generality of our proposed methods.' volume: 162 URL: https://proceedings.mlr.press/v162/xu22h.html PDF: https://proceedings.mlr.press/v162/xu22h/xu22h.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-xu22h.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jin family: Xu - given: Xu family: Tan - given: Kaitao family: Song - given: Renqian family: Luo - given: Yichong family: Leng - given: Tao family: Qin - given: Tie-Yan family: Liu - given: Jian family: Li editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24646-24662 id: xu22h issued: date-parts: - 2022 - 6 - 28 firstpage: 24646 lastpage: 24662 published: 2022-06-28 00:00:00 +0000 - title: 'On the Statistical Benefits of Curriculum Learning' abstract: 'Curriculum learning (CL) is a commonly used machine learning training strategy. However, we still lack a clear theoretical understanding of CL’s benefits. In this paper, we study the benefits of CL in the multitask linear regression problem under both structured and unstructured settings. For both settings, we derive the minimax rates for CL with the oracle that provides the optimal curriculum and without the oracle, where the agent has to adaptively learn a good curriculum. Our results reveal that adaptive learning can be fundamentally harder than the oracle learning in the unstructured setting, but it merely introduces a small extra term in the structured setting. To connect theory with practice, we provide justification for a popular empirical method that selects tasks with highest local prediction gain by comparing its guarantees with the minimax rates mentioned above.' volume: 162 URL: https://proceedings.mlr.press/v162/xu22i.html PDF: https://proceedings.mlr.press/v162/xu22i/xu22i.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-xu22i.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ziping family: Xu - given: Ambuj family: Tewari editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24663-24682 id: xu22i issued: date-parts: - 2022 - 6 - 28 firstpage: 24663 lastpage: 24682 published: 2022-06-28 00:00:00 +0000 - title: 'A Difference Standardization Method for Mutual Transfer Learning' abstract: 'In many real-world applications, mutual transfer learning is the paradigm that each data domain can potentially be a source or target domain. This is quite different from transfer learning tasks where the source and target are known a priori. However, previous studies about mutual transfer learning either suffer from high computational complexity or oversimplified hypothesis. To overcome these challenges, in this paper, we propose the \underline{Diff}erence \underline{S}tandardization method ({\bf DiffS}) for mutual transfer learning. Specifically, we put forward a novel distance metric between domains, the standardized domain difference, to obtain fast structure recovery and accurate parameter estimation simultaneously. We validate the method’s performance using both synthetic and real-world data. Compared to previous methods, DiffS demonstrates a speed-up of approximately 3000 times that of similar methods and achieves the same accurate learnability structure estimation.' volume: 162 URL: https://proceedings.mlr.press/v162/xu22j.html PDF: https://proceedings.mlr.press/v162/xu22j/xu22j.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-xu22j.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Haoqing family: Xu - given: Meng family: Wang - given: Beilun family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24683-24697 id: xu22j issued: date-parts: - 2022 - 6 - 28 firstpage: 24683 lastpage: 24697 published: 2022-06-28 00:00:00 +0000 - title: 'SkexGen: Autoregressive Generation of CAD Construction Sequences with Disentangled Codebooks' abstract: 'We present SkexGen, a novel autoregressive generative model for computer-aided design (CAD) construction sequences containing sketch-and-extrude modeling operations. Our model utilizes distinct Transformer architectures to encode topological, geometric, and extrusion variations of construction sequences into disentangled codebooks. Autoregressive Transformer decoders generate CAD construction sequences sharing certain properties specified by the codebook vectors. Extensive experiments demonstrate that our disentangled codebook representation generates diverse and high-quality CAD models, enhances user control, and enables efficient exploration of the design space. The code is available at https://samxuxiang.github.io/skexgen.' volume: 162 URL: https://proceedings.mlr.press/v162/xu22k.html PDF: https://proceedings.mlr.press/v162/xu22k/xu22k.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-xu22k.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xiang family: Xu - given: Karl D.D. family: Willis - given: Joseph G family: Lambourne - given: Chin-Yi family: Cheng - given: Pradeep Kumar family: Jayaraman - given: Yasutaka family: Furukawa editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24698-24724 id: xu22k issued: date-parts: - 2022 - 6 - 28 firstpage: 24698 lastpage: 24724 published: 2022-06-28 00:00:00 +0000 - title: 'Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations' abstract: 'We study the problem of offline Imitation Learning (IL) where an agent aims to learn an optimal expert behavior policy without additional online environment interactions. Instead, the agent is provided with a supplementary offline dataset from suboptimal behaviors. Prior works that address this problem either require that expert data occupies the majority proportion of the offline dataset, or need to learn a reward function and perform offline reinforcement learning (RL) afterwards. In this paper, we aim to address the problem without additional steps of reward learning and offline RL training for the case when demonstrations contain a large proportion of suboptimal data. Built upon behavioral cloning (BC), we introduce an additional discriminator to distinguish expert and non-expert data. We propose a cooperation framework to boost the learning of both tasks, Based on this framework, we design a new IL algorithm, where the outputs of the discriminator serve as the weights of the BC loss. Experimental results show that our proposed algorithm achieves higher returns and faster training speed compared to baseline algorithms.' volume: 162 URL: https://proceedings.mlr.press/v162/xu22l.html PDF: https://proceedings.mlr.press/v162/xu22l/xu22l.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-xu22l.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Haoran family: Xu - given: Xianyuan family: Zhan - given: Honglei family: Yin - given: Huiling family: Qin editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24725-24742 id: xu22l issued: date-parts: - 2022 - 6 - 28 firstpage: 24725 lastpage: 24742 published: 2022-06-28 00:00:00 +0000 - title: 'Adversarial Attack and Defense for Non-Parametric Two-Sample Tests' abstract: 'Non-parametric two-sample tests (TSTs) that judge whether two sets of samples are drawn from the same distribution, have been widely used in the analysis of critical data. People tend to employ TSTs as trusted basic tools and rarely have any doubt about their reliability. This paper systematically uncovers the failure mode of non-parametric TSTs through adversarial attacks and then proposes corresponding defense strategies. First, we theoretically show that an adversary can upper-bound the distributional shift which guarantees the attack’s invisibility. Furthermore, we theoretically find that the adversary can also degrade the lower bound of a TST’s test power, which enables us to iteratively minimize the test criterion in order to search for adversarial pairs. To enable TST-agnostic attacks, we propose an ensemble attack (EA) framework that jointly minimizes the different types of test criteria. Second, to robustify TSTs, we propose a max-min optimization that iteratively generates adversarial pairs to train the deep kernels. Extensive experiments on both simulated and real-world datasets validate the adversarial vulnerabilities of non-parametric TSTs and the effectiveness of our proposed defense. Source code is available at https://github.com/GodXuxilie/Robust-TST.git.' volume: 162 URL: https://proceedings.mlr.press/v162/xu22m.html PDF: https://proceedings.mlr.press/v162/xu22m/xu22m.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-xu22m.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xilie family: Xu - given: Jingfeng family: Zhang - given: Feng family: Liu - given: Masashi family: Sugiyama - given: Mohan family: Kankanhalli editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24743-24769 id: xu22m issued: date-parts: - 2022 - 6 - 28 firstpage: 24743 lastpage: 24769 published: 2022-06-28 00:00:00 +0000 - title: 'Adversarially Robust Models may not Transfer Better: Sufficient Conditions for Domain Transferability from the View of Regularization' abstract: 'Machine learning (ML) robustness and domain generalization are fundamentally correlated: they essentially concern data distribution shifts under adversarial and natural settings, respectively. On one hand, recent studies show that more robust (adversarially trained) models are more generalizable. On the other hand, there is a lack of theoretical understanding of their fundamental connections. In this paper, we explore the relationship between regularization and domain transferability considering different factors such as norm regularization and data augmentations (DA). We propose a general theoretical framework proving that factors involving the model function class regularization are sufficient conditions for relative domain transferability. Our analysis implies that “robustness" is neither necessary nor sufficient for transferability; rather, regularization is a more fundamental perspective for understanding domain transferability. We then discuss popular DA protocols (including adversarial training) and show when they can be viewed as the function class regularization under certain conditions and therefore improve generalization. We conduct extensive experiments to verify our theoretical findings and show several counterexamples where robustness and generalization are negatively correlated on different datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/xu22n.html PDF: https://proceedings.mlr.press/v162/xu22n/xu22n.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-xu22n.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xiaojun family: Xu - given: Jacky Y family: Zhang - given: Evelyn family: Ma - given: Hyun Ho family: Son - given: Sanmi family: Koyejo - given: Bo family: Li editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24770-24802 id: xu22n issued: date-parts: - 2022 - 6 - 28 firstpage: 24770 lastpage: 24802 published: 2022-06-28 00:00:00 +0000 - title: 'A Theoretical Analysis on Independence-driven Importance Weighting for Covariate-shift Generalization' abstract: 'Covariate-shift generalization, a typical case in out-of-distribution (OOD) generalization, requires a good performance on the unknown test distribution, which varies from the accessible training distribution in the form of covariate shift. Recently, independence-driven importance weighting algorithms in stable learning literature have shown empirical effectiveness to deal with covariate-shift generalization on several learning models, including regression algorithms and deep neural networks, while their theoretical analyses are missing. In this paper, we theoretically prove the effectiveness of such algorithms by explaining them as feature selection processes. We first specify a set of variables, named minimal stable variable set, that is the minimal and optimal set of variables to deal with covariate-shift generalization for common loss functions, such as the mean squared loss and binary cross-entropy loss. Afterward, we prove that under ideal conditions, independence-driven importance weighting algorithms could identify the variables in this set. Analysis of asymptotic properties is also provided. These theories are further validated in several synthetic experiments.' volume: 162 URL: https://proceedings.mlr.press/v162/xu22o.html PDF: https://proceedings.mlr.press/v162/xu22o/xu22o.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-xu22o.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Renzhe family: Xu - given: Xingxuan family: Zhang - given: Zheyan family: Shen - given: Tong family: Zhang - given: Peng family: Cui editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24803-24829 id: xu22o issued: date-parts: - 2022 - 6 - 28 firstpage: 24803 lastpage: 24829 published: 2022-06-28 00:00:00 +0000 - title: 'Langevin Monte Carlo for Contextual Bandits' abstract: 'We study the efficiency of Thompson sampling for contextual bandits. Existing Thompson sampling-based algorithms need to construct a Laplace approximation (i.e., a Gaussian distribution) of the posterior distribution, which is inefficient to sample in high dimensional applications for general covariance matrices. Moreover, the Gaussian approximation may not be a good surrogate for the posterior distribution for general reward generating functions. We propose an efficient posterior sampling algorithm, viz., Langevin Monte Carlo Thompson Sampling (LMC-TS), that uses Markov Chain Monte Carlo (MCMC) methods to directly sample from the posterior distribution in contextual bandits. Our method is computationally efficient since it only needs to perform noisy gradient descent updates without constructing the Laplace approximation of the posterior distribution. We prove that the proposed algorithm achieves the same sublinear regret bound as the best Thompson sampling algorithms for a special case of contextual bandits, viz., linear contextual bandits. We conduct experiments on both synthetic data and real-world datasets on different contextual bandit models, which demonstrates that directly sampling from the posterior is both computationally efficient and competitive in performance.' volume: 162 URL: https://proceedings.mlr.press/v162/xu22p.html PDF: https://proceedings.mlr.press/v162/xu22p/xu22p.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-xu22p.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Pan family: Xu - given: Hongkai family: Zheng - given: Eric V family: Mazumdar - given: Kamyar family: Azizzadenesheli - given: Animashree family: Anandkumar editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24830-24850 id: xu22p issued: date-parts: - 2022 - 6 - 28 firstpage: 24830 lastpage: 24850 published: 2022-06-28 00:00:00 +0000 - title: 'Investigating Why Contrastive Learning Benefits Robustness against Label Noise' abstract: 'Self-supervised Contrastive Learning (CL) has been recently shown to be very effective in preventing deep networks from overfitting noisy labels. Despite its empirical success, the theoretical understanding of the effect of contrastive learning on boosting robustness is very limited. In this work, we rigorously prove that the representation matrix learned by contrastive learning boosts robustness, by having: (i) one prominent singular value corresponding to each sub-class in the data, and significantly smaller remaining singular values; and (ii) a large alignment between the prominent singular vectors and the clean labels of each sub-class. The above properties enable a linear layer trained on such representations to effectively learn the clean labels without overfitting the noise. We further show that the low-rank structure of the Jacobian of deep networks pre-trained with contrastive learning allows them to achieve a superior performance initially, when fine-tuned on noisy labels. Finally, we demonstrate that the initial robustness provided by contrastive learning enables robust training methods to achieve state-of-the-art performance under extreme noise levels, e.g., an average of 27.18% and 15.58% increase in accuracy on CIFAR-10 and CIFAR-100 with 80% symmetric noisy labels, and 4.11% increase in accuracy on WebVision.' volume: 162 URL: https://proceedings.mlr.press/v162/xue22a.html PDF: https://proceedings.mlr.press/v162/xue22a/xue22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-xue22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yihao family: Xue - given: Kyle family: Whitecross - given: Baharan family: Mirzasoleiman editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24851-24871 id: xue22a issued: date-parts: - 2022 - 6 - 28 firstpage: 24851 lastpage: 24871 published: 2022-06-28 00:00:00 +0000 - title: 'Diversified Adversarial Attacks based on Conjugate Gradient Method' abstract: 'Deep learning models are vulnerable to adversarial examples, and adversarial attacks used to generate such examples have attracted considerable research interest. Although existing methods based on the steepest descent have achieved high attack success rates, ill-conditioned problems occasionally reduce their performance. To address this limitation, we utilize the conjugate gradient (CG) method, which is effective for this type of problem, and propose a novel attack algorithm inspired by the CG method, named the Auto Conjugate Gradient (ACG) attack. The results of large-scale evaluation experiments conducted on the latest robust models show that, for most models, ACG was able to find more adversarial examples with fewer iterations than the existing SOTA algorithm Auto-PGD (APGD). We investigated the difference in search performance between ACG and APGD in terms of diversification and intensification, and define a measure called Diversity Index (DI) to quantify the degree of diversity. From the analysis of the diversity using this index, we show that the more diverse search of the proposed method remarkably improves its attack success rate.' volume: 162 URL: https://proceedings.mlr.press/v162/yamamura22a.html PDF: https://proceedings.mlr.press/v162/yamamura22a/yamamura22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yamamura22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Keiichiro family: Yamamura - given: Haruki family: Sato - given: Nariaki family: Tateiwa - given: Nozomi family: Hata - given: Toru family: Mitsutake - given: Issa family: Oe - given: Hiroki family: Ishikura - given: Katsuki family: Fujisawa editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24872-24894 id: yamamura22a issued: date-parts: - 2022 - 6 - 28 firstpage: 24872 lastpage: 24894 published: 2022-06-28 00:00:00 +0000 - title: 'Cycle Representation Learning for Inductive Relation Prediction' abstract: 'In recent years, algebraic topology and its modern development, the theory of persistent homology, has shown great potential in graph representation learning. In this paper, based on the mathematics of algebraic topology, we propose a novel solution for inductive relation prediction, an important learning task for knowledge graph completion. To predict the relation between two entities, one can use the existence of rules, namely a sequence of relations. Previous works view rules as paths and primarily focus on the searching of paths between entities. The space of rules is huge, and one has to sacrifice either efficiency or accuracy. In this paper, we consider rules as cycles and show that the space of cycles has a unique structure based on the mathematics of algebraic topology. By exploring the linear structure of the cycle space, we can improve the searching efficiency of rules. We propose to collect cycle bases that span the space of cycles. We build a novel GNN framework on the collected cycles to learn the representations of cycles, and to predict the existence/non-existence of a relation. Our method achieves state-of-the-art performance on benchmarks.' volume: 162 URL: https://proceedings.mlr.press/v162/yan22a.html PDF: https://proceedings.mlr.press/v162/yan22a/yan22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yan22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zuoyu family: Yan - given: Tengfei family: Ma - given: Liangcai family: Gao - given: Zhi family: Tang - given: Chao family: Chen editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24895-24910 id: yan22a issued: date-parts: - 2022 - 6 - 28 firstpage: 24895 lastpage: 24910 published: 2022-06-28 00:00:00 +0000 - title: 'Optimally Controllable Perceptual Lossy Compression' abstract: 'Recent studies in lossy compression show that distortion and perceptual quality are at odds with each other, which put forward the tradeoff between distortion and perception (D-P). Intuitively, to attain different perceptual quality, different decoders have to be trained. In this paper, we present a nontrivial finding that only two decoders are sufficient for optimally achieving arbitrary (an infinite number of different) D-P tradeoff. We prove that arbitrary points of the D-P tradeoff bound can be achieved by a simple linear interpolation between the outputs of a minimum MSE decoder and a specifically constructed perfect perceptual decoder. Meanwhile, the perceptual quality (in terms of the squared Wasserstein-2 distance metric) can be quantitatively controlled by the interpolation factor. Furthermore, to construct a perfect perceptual decoder, we propose two theoretically optimal training frameworks. The new frameworks are different from the distortion-plus-adversarial loss based heuristic framework widely used in existing methods, which are not only theoretically optimal but also can yield state-of-the-art performance in practical perceptual decoding. Finally, we validate our theoretical finding and demonstrate the superiority of our frameworks via experiments. Code is available at: https://github.com/ZeyuYan/ControllablePerceptual-Compression' volume: 162 URL: https://proceedings.mlr.press/v162/yan22b.html PDF: https://proceedings.mlr.press/v162/yan22b/yan22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yan22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zeyu family: Yan - given: Fei family: Wen - given: Peilin family: Liu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24911-24928 id: yan22b issued: date-parts: - 2022 - 6 - 28 firstpage: 24911 lastpage: 24928 published: 2022-06-28 00:00:00 +0000 - title: 'Active fairness auditing' abstract: 'The fast spreading adoption of machine learning (ML) by companies across industries poses significant regulatory challenges. One such challenge is scalability: how can regulatory bodies efficiently audit these ML models, ensuring that they are fair? In this paper, we initiate the study of query-based auditing algorithms that can estimate the demographic parity of ML models in a query-efficient manner. We propose an optimal deterministic algorithm, as well as a practical randomized, oracle-efficient algorithm with comparable guarantees. Furthermore, we make inroads into understanding the optimal query complexity of randomized active fairness estimation algorithms. Our first exploration of active fairness estimation aims to put AI governance on firmer theoretical foundations.' volume: 162 URL: https://proceedings.mlr.press/v162/yan22c.html PDF: https://proceedings.mlr.press/v162/yan22c/yan22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yan22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tom family: Yan - given: Chicheng family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24929-24962 id: yan22c issued: date-parts: - 2022 - 6 - 28 firstpage: 24929 lastpage: 24962 published: 2022-06-28 00:00:00 +0000 - title: 'Self-Organized Polynomial-Time Coordination Graphs' abstract: 'Coordination graph is a promising approach to model agent collaboration in multi-agent reinforcement learning. It conducts a graph-based value factorization and induces explicit coordination among agents to complete complicated tasks. However, one critical challenge in this paradigm is the complexity of greedy action selection with respect to the factorized values. It refers to the decentralized constraint optimization problem (DCOP), which and whose constant-ratio approximation are NP-hard problems. To bypass this systematic hardness, this paper proposes a novel method, named Self-Organized Polynomial-time Coordination Graphs (SOP-CG), which uses structured graph classes to guarantee the accuracy and the computational efficiency of collaborated action selection. SOP-CG employs dynamic graph topology to ensure sufficient value function expressiveness. The graph selection is unified into an end-to-end learning paradigm. In experiments, we show that our approach learns succinct and well-adapted graph topologies, induces effective coordination, and improves performance across a variety of cooperative multi-agent tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/yang22a.html PDF: https://proceedings.mlr.press/v162/yang22a/yang22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yang22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Qianlan family: Yang - given: Weijun family: Dong - given: Zhizhou family: Ren - given: Jianhao family: Wang - given: Tonghan family: Wang - given: Chongjie family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24963-24979 id: yang22a issued: date-parts: - 2022 - 6 - 28 firstpage: 24963 lastpage: 24979 published: 2022-06-28 00:00:00 +0000 - title: 'Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning' abstract: 'Offline reinforcement learning (RL) extends the paradigm of classical RL algorithms to purely learning from static datasets, without interacting with the underlying environment during the learning process. A key challenge of offline RL is the instability of policy training, caused by the mismatch between the distribution of the offline data and the undiscounted stationary state-action distribution of the learned policy. To avoid the detrimental impact of distribution mismatch, we regularize the undiscounted stationary distribution of the current policy towards the offline data during the policy optimization process. Further, we train a dynamics model to both implement this regularization and better estimate the stationary distribution of the current policy, reducing the error induced by distribution mismatch. On a wide range of continuous-control offline RL datasets, our method indicates competitive performance, which validates our algorithm. The code is publicly available.' volume: 162 URL: https://proceedings.mlr.press/v162/yang22b.html PDF: https://proceedings.mlr.press/v162/yang22b/yang22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yang22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Shentao family: Yang - given: Yihao family: Feng - given: Shujian family: Zhang - given: Mingyuan family: Zhou editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 24980-25006 id: yang22b issued: date-parts: - 2022 - 6 - 28 firstpage: 24980 lastpage: 25006 published: 2022-06-28 00:00:00 +0000 - title: 'A Psychological Theory of Explainability' abstract: 'The goal of explainable Artificial Intelligence (XAI) is to generate human-interpretable explanations, but there are no computationally precise theories of how humans interpret AI generated explanations. The lack of theory means that validation of XAI must be done empirically, on a case-by-case basis, which prevents systematic theory-building in XAI. We propose a psychological theory of how humans draw conclusions from saliency maps, the most common form of XAI explanation, which for the first time allows for precise prediction of explainee inference conditioned on explanation. Our theory posits that absent explanation humans expect the AI to make similar decisions to themselves, and that they interpret an explanation by comparison to the explanations they themselves would give. Comparison is formalized via Shepard’s universal law of generalization in a similarity space, a classic theory from cognitive science. A pre-registered user study on AI image classifications with saliency map explanations demonstrate that our theory quantitatively matches participants’ predictions of the AI.' volume: 162 URL: https://proceedings.mlr.press/v162/yang22c.html PDF: https://proceedings.mlr.press/v162/yang22c/yang22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yang22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Scott Cheng-Hsin family: Yang - given: Nils Erik Tomas family: Folke - given: Patrick family: Shafto editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25007-25021 id: yang22c issued: date-parts: - 2022 - 6 - 28 firstpage: 25007 lastpage: 25021 published: 2022-06-28 00:00:00 +0000 - title: 'Omni-Granular Ego-Semantic Propagation for Self-Supervised Graph Representation Learning' abstract: 'Unsupervised/self-supervised graph representation learning is critical for downstream node- and graph-level classification tasks. Global structure of graphs helps discriminating representations and existing methods mainly utilize the global structure by imposing additional supervisions. However, their global semantics are usually invariant for all nodes/graphs and they fail to explicitly embed the global semantics to enrich the representations. In this paper, we propose Omni-Granular Ego-Semantic Propagation for Self-Supervised Graph Representation Learning (OEPG). Specifically, we introduce instance-adaptive global-aware ego-semantic descriptors, leveraging the first- and second-order feature differences between each node/graph and hierarchical global clusters of the entire graph dataset. The descriptors can be explicitly integrated into local graph convolution as new neighbor nodes. Besides, we design an omni-granular normalization on the whole scales and hierarchies of the ego-semantic to assign attentional weight to each descriptor from an omni-granular perspective. Specialized pretext tasks and cross-iteration momentum update are further developed for local-global mutual adaptation. In downstream tasks, OEPG consistently achieves the best performance with a 2%~6% accuracy gain on multiple datasets cross scales and domains. Notably, OEPG also generalizes to quantity- and topology-imbalance scenarios.' volume: 162 URL: https://proceedings.mlr.press/v162/yang22d.html PDF: https://proceedings.mlr.press/v162/yang22d/yang22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yang22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ling family: Yang - given: Shenda family: Hong editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25022-25037 id: yang22d issued: date-parts: - 2022 - 6 - 28 firstpage: 25022 lastpage: 25037 published: 2022-06-28 00:00:00 +0000 - title: 'Unsupervised Time-Series Representation Learning with Iterative Bilinear Temporal-Spectral Fusion' abstract: 'Unsupervised/self-supervised time series representation learning is a challenging problem because of its complex dynamics and sparse annotations. Existing works mainly adopt the framework of contrastive learning with the time-based augmentation techniques to sample positives and negatives for contrastive training. Nevertheless, they mostly use segment-level augmentation derived from time slicing, which may bring about sampling bias and incorrect optimization with false negatives due to the loss of global context. Besides, they all pay no attention to incorporate the spectral information in feature representation. In this paper, we propose a unified framework, namely Bilinear Temporal-Spectral Fusion (BTSF). Specifically, we firstly utilize the instance-level augmentation with a simple dropout on the entire time series for maximally capturing long-term dependencies. We devise a novel iterative bilinear temporal-spectral fusion to explicitly encode the affinities of abundant time-frequency pairs, and iteratively refines representations in a fusion-and-squeeze manner with Spectrum-to-Time (S2T) and Time-to-Spectrum (T2S) Aggregation modules. We firstly conducts downstream evaluations on three major tasks for time series including classification, forecasting and anomaly detection. Experimental results shows that our BTSF consistently significantly outperforms the state-of-the-art methods.' volume: 162 URL: https://proceedings.mlr.press/v162/yang22e.html PDF: https://proceedings.mlr.press/v162/yang22e/yang22e.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yang22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ling family: Yang - given: Shenda family: Hong editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25038-25054 id: yang22e issued: date-parts: - 2022 - 6 - 28 firstpage: 25038 lastpage: 25054 published: 2022-06-28 00:00:00 +0000 - title: 'Searching for BurgerFormer with Micro-Meso-Macro Space Design' abstract: 'With the success of Transformers in the computer vision field, the automated design of vision Transformers has attracted significant attention. Recently, MetaFormer found that simple average pooling can achieve impressive performance, which naturally raises the question of how to design a search space to search diverse and high-performance Transformer-like architectures. By revisiting typical search spaces, we design micro-meso-macro space to search for Transformer-like architectures, namely BurgerFormer. Micro, meso, and macro correspond to the granularity levels of operation, block and stage, respectively. At the microscopic level, we enrich the atomic operations to include various normalizations, activation functions, and basic operations (e.g., multi-head self attention, average pooling). At the mesoscopic level, a hamburger structure is searched out as the basic BurgerFormer block. At the macroscopic level, we search for the depth, width, and expansion ratio of the network based on the multi-stage architecture. Meanwhile, we propose a hybrid sampling method for effectively training the supernet. Experimental results demonstrate that the searched BurgerFormer architectures achieve comparable even superior performance compared with current state-of-the-art Transformers on the ImageNet and COCO datasets. The codes can be available at https://github.com/xingxing-123/BurgerFormer.' volume: 162 URL: https://proceedings.mlr.press/v162/yang22f.html PDF: https://proceedings.mlr.press/v162/yang22f/yang22f.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yang22f.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Longxing family: Yang - given: Yu family: Hu - given: Shun family: Lu - given: Zihao family: Sun - given: Jilin family: Mei - given: Yinhe family: Han - given: Xiaowei family: Li editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25055-25069 id: yang22f issued: date-parts: - 2022 - 6 - 28 firstpage: 25055 lastpage: 25069 published: 2022-06-28 00:00:00 +0000 - title: 'Efficient Variance Reduction for Meta-learning' abstract: 'Meta-learning tries to learn meta-knowledge from a large number of tasks. However, the stochastic meta-gradient can have large variance due to data sampling (from each task) and task sampling (from the whole task distribution), leading to slow convergence. In this paper, we propose a novel approach that integrates variance reduction with first-order meta-learning algorithms such as Reptile. It retains the bilevel formulation which better captures the structure of meta-learning, but does not require storing the vast number of task-specific parameters in general bilevel variance reduction methods. Theoretical results show that it has fast convergence rate due to variance reduction. Experiments on benchmark few-shot classification data sets demonstrate its effectiveness over state-of-the-art meta-learning algorithms with and without variance reduction.' volume: 162 URL: https://proceedings.mlr.press/v162/yang22g.html PDF: https://proceedings.mlr.press/v162/yang22g/yang22g.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yang22g.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hansi family: Yang - given: James family: Kwok editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25070-25095 id: yang22g issued: date-parts: - 2022 - 6 - 28 firstpage: 25070 lastpage: 25095 published: 2022-06-28 00:00:00 +0000 - title: 'Injecting Logical Constraints into Neural Networks via Straight-Through Estimators' abstract: 'Injecting discrete logical constraints into neural network learning is one of the main challenges in neuro-symbolic AI. We find that a straight-through-estimator, a method introduced to train binary neural networks, could effectively be applied to incorporate logical constraints into neural network learning. More specifically, we design a systematic way to represent discrete logical constraints as a loss function; minimizing this loss using gradient descent via a straight-through-estimator updates the neural network’s weights in the direction that the binarized outputs satisfy the logical constraints. The experimental results show that by leveraging GPUs and batch training, this method scales significantly better than existing neuro-symbolic methods that require heavy symbolic computation for computing gradients. Also, we demonstrate that our method applies to different types of neural networks, such as MLP, CNN, and GNN, making them learn with no or fewer labeled data by learning directly from known constraints.' volume: 162 URL: https://proceedings.mlr.press/v162/yang22h.html PDF: https://proceedings.mlr.press/v162/yang22h/yang22h.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yang22h.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhun family: Yang - given: Joohyung family: Lee - given: Chiyoun family: Park editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25096-25122 id: yang22h issued: date-parts: - 2022 - 6 - 28 firstpage: 25096 lastpage: 25122 published: 2022-06-28 00:00:00 +0000 - title: 'Locally Sparse Neural Networks for Tabular Biomedical Data' abstract: 'Tabular datasets with low-sample-size or many variables are prevalent in biomedicine. Practitioners in this domain prefer linear or tree-based models over neural networks since the latter are harder to interpret and tend to overfit when applied to tabular datasets. To address these neural networks’ shortcomings, we propose an intrinsically interpretable network for heterogeneous biomedical data. We design a locally sparse neural network where the local sparsity is learned to identify the subset of most relevant features for each sample. This sample-specific sparsity is predicted via a gating network, which is trained in tandem with the prediction network. By forcing the model to select a subset of the most informative features for each sample, we reduce model overfitting in low-sample-size data and obtain an interpretable model. We demonstrate that our method outperforms state-of-the-art models when applied to synthetic or real-world biomedical datasets using extensive experiments. Furthermore, the proposed framework dramatically outperforms existing schemes when evaluating its interpretability capabilities. Finally, we demonstrate the applicability of our model to two important biomedical tasks: survival analysis and marker gene identification.' volume: 162 URL: https://proceedings.mlr.press/v162/yang22i.html PDF: https://proceedings.mlr.press/v162/yang22i/yang22i.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yang22i.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Junchen family: Yang - given: Ofir family: Lindenbaum - given: Yuval family: Kluger editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25123-25153 id: yang22i issued: date-parts: - 2022 - 6 - 28 firstpage: 25123 lastpage: 25153 published: 2022-06-28 00:00:00 +0000 - title: 'Not All Poisons are Created Equal: Robust Training against Data Poisoning' abstract: 'Data poisoning causes misclassification of test time target examples, by injecting maliciously crafted samples in the training data. Existing defenses are often effective only against a specific type of targeted attack, significantly degrade the generalization performance, or are prohibitive for standard deep learning pipelines. In this work, we propose an efficient defense mechanism that significantly reduces the success rate of various data poisoning attacks, and provides theoretical guarantees for the performance of the model. Targeted attacks work by adding bounded perturbations to a randomly selected subset of training data to match the targets’ gradient or representation. We show that: (i) under bounded perturbations, only a number of poisons can be optimized to have a gradient that is close enough to that of the target and make the attack successful; (ii) such effective poisons move away from their original class and get isolated in the gradient space; (iii) dropping examples in low-density gradient regions during training can successfully eliminate the effective poisons, and guarantees similar training dynamics to that of training on full data. Our extensive experiments show that our method significantly decreases the success rate of state-of-the-art targeted attacks, including Gradient Matching and Bullseye Polytope, and easily scales to large datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/yang22j.html PDF: https://proceedings.mlr.press/v162/yang22j/yang22j.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yang22j.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yu family: Yang - given: Tian Yu family: Liu - given: Baharan family: Mirzasoleiman editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25154-25165 id: yang22j issued: date-parts: - 2022 - 6 - 28 firstpage: 25154 lastpage: 25165 published: 2022-06-28 00:00:00 +0000 - title: 'Does the Data Induce Capacity Control in Deep Learning?' abstract: 'We show that the input correlation matrix of typical classification datasets has an eigenspectrum where, after a sharp initial drop, a large number of small eigenvalues are distributed uniformly over an exponentially large range. This structure is mirrored in a network trained on this data: we show that the Hessian and the Fisher Information Matrix (FIM) have eigenvalues that are spread uniformly over exponentially large ranges. We call such eigenspectra “sloppy” because sets of weights corresponding to small eigenvalues can be changed by large magnitudes without affecting the loss. Networks trained on atypical datasets with non-sloppy inputs do not share these traits and deep networks trained on such datasets generalize poorly. Inspired by this, we study the hypothesis that sloppiness of inputs aids generalization in deep networks. We show that if the Hessian is sloppy, we can compute non-vacuous PAC-Bayes generalization bounds analytically. By exploiting our empirical observation that training predominantly takes place in the non-sloppy subspace of the FIM, we develop data-distribution dependent PAC-Bayes priors that lead to accurate generalization bounds using numerical optimization.' volume: 162 URL: https://proceedings.mlr.press/v162/yang22k.html PDF: https://proceedings.mlr.press/v162/yang22k/yang22k.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yang22k.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Rubing family: Yang - given: Jialin family: Mao - given: Pratik family: Chaudhari editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25166-25197 id: yang22k issued: date-parts: - 2022 - 6 - 28 firstpage: 25166 lastpage: 25197 published: 2022-06-28 00:00:00 +0000 - title: 'Informed Learning by Wide Neural Networks: Convergence, Generalization and Sampling Complexity' abstract: 'By integrating domain knowledge with labeled samples, informed machine learning has been emerging to improve the learning performance for a wide range of applications. Nonetheless, rigorous understanding of the role of injected domain knowledge has been under-explored. In this paper, we consider an informed deep neural network (DNN) with over-parameterization and domain knowledge integrated into its training objective function, and study how and why domain knowledge benefits the performance. Concretely, we quantitatively demonstrate the two benefits of domain knowledge in informed learning {—} regularizing the label-based supervision and supplementing the labeled samples {—} and reveal the trade-off between label and knowledge imperfectness in the bound of the population risk. Based on the theoretical analysis, we propose a generalized informed training objective to better exploit the benefits of knowledge and balance the label and knowledge imperfectness, which is validated by the population risk bound. Our analysis on sampling complexity sheds lights on how to choose the hyper-parameters for informed learning, and further justifies the advantages of knowledge informed learning.' volume: 162 URL: https://proceedings.mlr.press/v162/yang22l.html PDF: https://proceedings.mlr.press/v162/yang22l/yang22l.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yang22l.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jianyi family: Yang - given: Shaolei family: Ren editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25198-25240 id: yang22l issued: date-parts: - 2022 - 6 - 28 firstpage: 25198 lastpage: 25240 published: 2022-06-28 00:00:00 +0000 - title: 'Linear Bandit Algorithms with Sublinear Time Complexity' abstract: 'We propose two linear bandits algorithms with per-step complexity sublinear in the number of arms $K$. The algorithms are designed for applications where the arm set is extremely large and slowly changing. Our key realization is that choosing an arm reduces to a maximum inner product search (MIPS) problem, which can be solved approximately without breaking regret guarantees. Existing approximate MIPS solvers run in sublinear time. We extend those solvers and present theoretical guarantees for online learning problems, where adaptivity (i.e., a later step depends on the feedback in previous steps) becomes a unique challenge. We then explicitly characterize the tradeoff between the per-step complexity and regret. For sufficiently large $K$, our algorithms have sublinear per-step complexity and $\widetilde O(\sqrt{T})$ regret. Empirically, we evaluate our proposed algorithms in a synthetic environment and a real-world online movie recommendation problem. Our proposed algorithms can deliver a more than 72 times speedup compared to the linear time baselines while retaining similar regret.' volume: 162 URL: https://proceedings.mlr.press/v162/yang22m.html PDF: https://proceedings.mlr.press/v162/yang22m/yang22m.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yang22m.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Shuo family: Yang - given: Tongzheng family: Ren - given: Sanjay family: Shakkottai - given: Eric family: Price - given: Inderjit S. family: Dhillon - given: Sujay family: Sanghavi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25241-25260 id: yang22m issued: date-parts: - 2022 - 6 - 28 firstpage: 25241 lastpage: 25260 published: 2022-06-28 00:00:00 +0000 - title: 'A New Perspective on the Effects of Spectrum in Graph Neural Networks' abstract: 'Many improvements on GNNs can be deemed as operations on the spectrum of the underlying graph matrix, which motivates us to directly study the characteristics of the spectrum and their effects on GNN performance. By generalizing most existing GNN architectures, we show that the correlation issue caused by the unsmooth spectrum becomes the obstacle to leveraging more powerful graph filters as well as developing deep architectures, which therefore restricts GNNs’ performance. Inspired by this, we propose the correlation-free architecture which naturally removes the correlation issue among different channels, making it possible to utilize more sophisticated filters within each channel. The final correlation-free architecture with more powerful filters consistently boosts the performance of learning graph representations. Code is available at https://github.com/qslim/gnn-spectrum.' volume: 162 URL: https://proceedings.mlr.press/v162/yang22n.html PDF: https://proceedings.mlr.press/v162/yang22n/yang22n.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yang22n.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mingqi family: Yang - given: Yanming family: Shen - given: Rui family: Li - given: Heng family: Qi - given: Qiang family: Zhang - given: Baocai family: Yin editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25261-25279 id: yang22n issued: date-parts: - 2022 - 6 - 28 firstpage: 25261 lastpage: 25279 published: 2022-06-28 00:00:00 +0000 - title: 'Fourier Learning with Cyclical Data' abstract: 'Many machine learning models for online applications, such as recommender systems, are often trained on data with cyclical properties. These data sequentially arrive from a time-varying distribution that is periodic in time. Existing algorithms either use streaming learning to track a time-varying set of optimal model parameters, yielding a dynamic regret that scales linearly in time; or partition the data of each cycle into multiple segments and train a separate model for each—a pluralistic approach that is computationally and storage-wise expensive. In this paper, we have designed a novel approach to overcome the aforementioned shortcomings. Our method, named "Fourier learning", encodes the periodicity into the model representation using a partial Fourier sequence, and trains the coefficient functions modeled by neural networks. Particularly, we design a Fourier multi-layer perceptron (F-MLP) that can be trained on streaming data with stochastic gradient descent (streaming-SGD), and we derive its convergence guarantees. We demonstrate Fourier learning’s better performance with extensive experiments on synthetic and public datasets, as well as on a large-scale recommender system that is updated in real-time, and trained with tens of millions of samples per day.' volume: 162 URL: https://proceedings.mlr.press/v162/yang22o.html PDF: https://proceedings.mlr.press/v162/yang22o/yang22o.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yang22o.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yingxiang family: Yang - given: Zhihan family: Xiong - given: Tianyi family: Liu - given: Taiqing family: Wang - given: Chong family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25280-25301 id: yang22o issued: date-parts: - 2022 - 6 - 28 firstpage: 25280 lastpage: 25301 published: 2022-06-28 00:00:00 +0000 - title: 'Estimating Instance-dependent Bayes-label Transition Matrix using a Deep Neural Network' abstract: 'In label-noise learning, estimating the transition matrix is a hot topic as the matrix plays an important role in building statistically consistent classifiers. Traditionally, the transition from clean labels to noisy labels (i.e., clean-label transition matrix (CLTM)) has been widely exploited to learn a clean label classifier by employing the noisy data. Motivated by that classifiers mostly output Bayes optimal labels for prediction, in this paper, we study to directly model the transition from Bayes optimal labels to noisy labels (i.e., Bayes-label transition matrix (BLTM)) and learn a classifier to predict Bayes optimal labels. Note that given only noisy data, it is ill-posed to estimate either the CLTM or the BLTM. But favorably, Bayes optimal labels have less uncertainty compared with the clean labels, i.e., the class posteriors of Bayes optimal labels are one-hot vectors while those of clean labels are not. This enables two advantages to estimate the BLTM, i.e., (a) a set of examples with theoretically guaranteed Bayes optimal labels can be collected out of noisy data; (b) the feasible solution space is much smaller. By exploiting the advantages, we estimate the BLTM parametrically by employing a deep neural network, leading to better generalization and superior classification performance.' volume: 162 URL: https://proceedings.mlr.press/v162/yang22p.html PDF: https://proceedings.mlr.press/v162/yang22p/yang22p.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yang22p.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Shuo family: Yang - given: Erkun family: Yang - given: Bo family: Han - given: Yang family: Liu - given: Min family: Xu - given: Gang family: Niu - given: Tongliang family: Liu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25302-25312 id: yang22p issued: date-parts: - 2022 - 6 - 28 firstpage: 25302 lastpage: 25312 published: 2022-06-28 00:00:00 +0000 - title: 'A Study of Face Obfuscation in ImageNet' abstract: 'Face obfuscation (blurring, mosaicing, etc.) has been shown to be effective for privacy protection; nevertheless, object recognition research typically assumes access to complete, unobfuscated images. In this paper, we explore the effects of face obfuscation on the popular ImageNet challenge visual recognition benchmark. Most categories in the ImageNet challenge are not people categories; however, many incidental people appear in the images, and their privacy is a concern. We first annotate faces in the dataset. Then we demonstrate that face obfuscation has minimal impact on the accuracy of recognition models. Concretely, we benchmark multiple deep neural networks on obfuscated images and observe that the overall recognition accuracy drops only slightly (<= 1.0%). Further, we experiment with transfer learning to 4 downstream tasks (object recognition, scene recognition, face attribute classification, and object detection) and show that features learned on obfuscated images are equally transferable. Our work demonstrates the feasibility of privacy-aware visual recognition, improves the highly-used ImageNet challenge benchmark, and suggests an important path for future visual datasets. Data and code are available at https://github.com/princetonvisualai/imagenet-face-obfuscation.' volume: 162 URL: https://proceedings.mlr.press/v162/yang22q.html PDF: https://proceedings.mlr.press/v162/yang22q/yang22q.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yang22q.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Kaiyu family: Yang - given: Jacqueline H. family: Yau - given: Li family: Fei-Fei - given: Jia family: Deng - given: Olga family: Russakovsky editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25313-25330 id: yang22q issued: date-parts: - 2022 - 6 - 28 firstpage: 25313 lastpage: 25330 published: 2022-06-28 00:00:00 +0000 - title: 'Anarchic Federated Learning' abstract: 'Present-day federated learning (FL) systems deployed over edge networks consists of a large number of workers with high degrees of heterogeneity in data and/or computing capabilities, which call for flexible worker participation in terms of timing, effort, data heterogeneity, etc. To satisfy the need for flexible worker participation, we consider a new FL paradigm called “Anarchic Federated Learning” (AFL) in this paper. In stark contrast to conventional FL models, each worker in AFL has the freedom to choose i) when to participate in FL, and ii) the number of local steps to perform in each round based on its current situation (e.g., battery level, communication channels, privacy concerns). However, such chaotic worker behaviors in AFL impose many new open questions in algorithm design. In particular, it remains unclear whether one could develop convergent AFL training algorithms, and if yes, under what conditions and how fast the achievable convergence speed is. Toward this end, we propose two Anarchic Federated Averaging (AFA) algorithms with two-sided learning rates for both cross-device and cross-silo settings, which are named AFA-CD and AFA-CS, respectively. Somewhat surprisingly, we show that, under mild anarchic assumptions, both AFL algorithms achieve the best known convergence rate as the state-of-the-art algorithms for conventional FL. Moreover, they retain the highly desirable linear speedup effect with respect of both the number of workers and local steps in the new AFL paradigm. We validate the proposed algorithms with extensive experiments on real-world datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/yang22r.html PDF: https://proceedings.mlr.press/v162/yang22r/yang22r.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yang22r.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Haibo family: Yang - given: Xin family: Zhang - given: Prashant family: Khanduri - given: Jia family: Liu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25331-25363 id: yang22r issued: date-parts: - 2022 - 6 - 28 firstpage: 25331 lastpage: 25363 published: 2022-06-28 00:00:00 +0000 - title: 'Identity-Disentangled Adversarial Augmentation for Self-supervised Learning' abstract: 'Data augmentation is critical to contrastive self-supervised learning, whose goal is to distinguish a sample’s augmentations (positives) from other samples (negatives). However, strong augmentations may change the sample-identity of the positives, while weak augmentation produces easy positives/negatives leading to nearly-zero loss and ineffective learning. In this paper, we study a simple adversarial augmentation method that can modify training data to be hard positives/negatives without distorting the key information about their original identities. In particular, we decompose a sample $x$ to be its variational auto-encoder (VAE) reconstruction $G(x)$ plus the residual $R(x)=x-G(x)$, where $R(x)$ retains most identity-distinctive information due to an information-theoretic interpretation of the VAE objective. We then adversarially perturb $G(x)$ in the VAE’s bottleneck space and adds it back to the original $R(x)$ as an augmentation, which is therefore sufficiently challenging for contrastive learning and meanwhile preserves the sample identity intact. We apply this “identity-disentangled adversarial augmentation (IDAA)” to different self-supervised learning methods. On multiple benchmark datasets, IDAA consistently improves both their efficiency and generalization performance. We further show that IDAA learned on a dataset can be transferred to other datasets. Code is available at \href{https://github.com/kai-wen-yang/IDAA}{https://github.com/kai-wen-yang/IDAA}.' volume: 162 URL: https://proceedings.mlr.press/v162/yang22s.html PDF: https://proceedings.mlr.press/v162/yang22s/yang22s.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yang22s.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Kaiwen family: Yang - given: Tianyi family: Zhou - given: Xinmei family: Tian - given: Dacheng family: Tao editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25364-25381 id: yang22s issued: date-parts: - 2022 - 6 - 28 firstpage: 25364 lastpage: 25381 published: 2022-06-28 00:00:00 +0000 - title: 'Learning from a Learning User for Optimal Recommendations' abstract: 'In real-world recommendation problems, especially those with a formidably large item space, users have to gradually learn to estimate the utility of any fresh recommendations from their experience about previously consumed items. This in turn affects their interaction dynamics with the system and can invalidate previous algorithms built on the omniscient user assumption. In this paper, we formalize a model to capture such ”learning users” and design an efficient system-side learning solution, coined Noise-Robust Active Ellipsoid Search (RAES), to confront the challenges brought by the non-stationary feedback from such a learning user. Interestingly, we prove that the regret of RAES deteriorates gracefully as the convergence rate of user learning becomes worse, until reaching linear regret when the user’s learning fails to converge. Experiments on synthetic datasets demonstrate the strength of RAES for such a contemporaneous system-user learning problem. Our study provides a novel perspective on modeling the feedback loop in recommendation problems.' volume: 162 URL: https://proceedings.mlr.press/v162/yao22a.html PDF: https://proceedings.mlr.press/v162/yao22a/yao22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yao22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Fan family: Yao - given: Chuanhao family: Li - given: Denis family: Nekipelov - given: Hongning family: Wang - given: Haifeng family: Xu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25382-25406 id: yao22a issued: date-parts: - 2022 - 6 - 28 firstpage: 25382 lastpage: 25406 published: 2022-06-28 00:00:00 +0000 - title: 'Improving Out-of-Distribution Robustness via Selective Augmentation' abstract: 'Machine learning algorithms typically assume that training and test examples are drawn from the same distribution. However, distribution shift is a common problem in real-world applications and can cause models to perform dramatically worse at test time. In this paper, we specifically consider the problems of subpopulation shifts (e.g., imbalanced data) and domain shifts. While prior works often seek to explicitly regularize internal representations or predictors of the model to be domain invariant, we instead aim to learn invariant predictors without restricting the model’s internal representations or predictors. This leads to a simple mixup-based technique which learns invariant predictors via selective augmentation called LISA. LISA selectively interpolates samples either with the same labels but different domains or with the same domain but different labels. Empirically, we study the effectiveness of LISA on nine benchmarks ranging from subpopulation shifts to domain shifts, and we find that LISA consistently outperforms other state-of-the-art methods and leads to more invariant predictors. We further analyze a linear setting and theoretically show how LISA leads to a smaller worst-group error.' volume: 162 URL: https://proceedings.mlr.press/v162/yao22b.html PDF: https://proceedings.mlr.press/v162/yao22b/yao22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yao22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Huaxiu family: Yao - given: Yu family: Wang - given: Sai family: Li - given: Linjun family: Zhang - given: Weixin family: Liang - given: James family: Zou - given: Chelsea family: Finn editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25407-25437 id: yao22b issued: date-parts: - 2022 - 6 - 28 firstpage: 25407 lastpage: 25437 published: 2022-06-28 00:00:00 +0000 - title: 'NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework' abstract: 'Pretrained language models have become the standard approach for many NLP tasks due to strong performance, but they are very expensive to train. We propose a simple and efficient learning framework, TLM, that does not rely on large-scale pretraining. Given some labeled task data and a large general corpus, TLM uses task data as queries to retrieve a tiny subset of the general corpus and jointly optimizes the task objective and the language modeling objective from scratch. On eight classification datasets in four domains, TLM achieves results better than or similar to pretrained language models (e.g., RoBERTa-Large) while reducing the training FLOPs by two orders of magnitude. With high accuracy and efficiency, we hope TLM will contribute to democratizing NLP and expediting its development.' volume: 162 URL: https://proceedings.mlr.press/v162/yao22c.html PDF: https://proceedings.mlr.press/v162/yao22c/yao22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yao22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xingcheng family: Yao - given: Yanan family: Zheng - given: Xiaocong family: Yang - given: Zhilin family: Yang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25438-25451 id: yao22c issued: date-parts: - 2022 - 6 - 28 firstpage: 25438 lastpage: 25451 published: 2022-06-28 00:00:00 +0000 - title: 'Feature Space Particle Inference for Neural Network Ensembles' abstract: 'Ensembles of deep neural networks demonstrate improved performance over single models. For enhancing the diversity of ensemble members while keeping their performance, particle-based inference methods offer a promising approach from a Bayesian perspective. However, the best way to apply these methods to neural networks is still unclear: seeking samples from the weight-space posterior suffers from inefficiency due to the over-parameterization issues, while seeking samples directly from the function-space posterior often leads to serious underfitting. In this study, we propose to optimize particles in the feature space where activations of a specific intermediate layer lie to alleviate the abovementioned difficulties. Our method encourages each member to capture distinct features, which are expected to increase the robustness of the ensemble prediction. Extensive evaluation on real-world datasets exhibits that our model significantly outperforms the gold-standard Deep Ensembles on various metrics, including accuracy, calibration, and robustness.' volume: 162 URL: https://proceedings.mlr.press/v162/yashima22a.html PDF: https://proceedings.mlr.press/v162/yashima22a/yashima22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yashima22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Shingo family: Yashima - given: Teppei family: Suzuki - given: Kohta family: Ishikawa - given: Ikuro family: Sato - given: Rei family: Kawakami editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25452-25468 id: yashima22a issued: date-parts: - 2022 - 6 - 28 firstpage: 25452 lastpage: 25468 published: 2022-06-28 00:00:00 +0000 - title: 'Centroid Approximation for Bootstrap: Improving Particle Quality at Inference' abstract: 'Bootstrap is a principled and powerful frequentist statistical tool for uncertainty quantification. Unfortunately, standard bootstrap methods are computationally intensive due to the need of drawing a large i.i.d. bootstrap sample to approximate the ideal bootstrap distribution; this largely hinders their application in large-scale machine learning, especially deep learning problems. In this work, we propose an efficient method to explicitly optimize a small set of high quality “centroid” points to better approximate the ideal bootstrap distribution. We achieve this by minimizing a simple objective function that is asymptotically equivalent to the Wasserstein distance to the ideal bootstrap distribution. This allows us to provide an accurate estimation of uncertainty with a small number of bootstrap centroids, outperforming the naive i.i.d. sampling approach. Empirically, we show that our method can boost the performance of bootstrap in a variety of applications.' volume: 162 URL: https://proceedings.mlr.press/v162/ye22a.html PDF: https://proceedings.mlr.press/v162/ye22a/ye22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ye22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mao family: Ye - given: Qiang family: Liu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25469-25489 id: ye22a issued: date-parts: - 2022 - 6 - 28 firstpage: 25469 lastpage: 25489 published: 2022-06-28 00:00:00 +0000 - title: 'Be Like Water: Adaptive Floating Point for Machine Learning' abstract: 'In the pursuit of optimizing memory and compute density to accelerate machine learning applications, reduced precision training and inference has been an active area of research. While some approaches selectively apply low precision computations, this may require costly off-chip data transfers or mixed precision support. In this paper, we propose a novel numerical representation, Adaptive Floating Point (AFP), that dynamically adjusts to the characteristics of deep learning data. AFP requires no changes to the model topology, requires no additional training, and applies to all layers of DNN models. We evaluate AFP on a spectrum of representative models in computer vision and NLP, and show that our technique enables ultra-low precision inference of deep learning models while providing accuracy comparable to full precision inference. By dynamically adjusting to ML data, AFP increases memory density by 1.6x, 1.6x, and 3.2x and compute density by 4x, 1.3x, and 12x when compared to BFP, BFloat16, and FP32.' volume: 162 URL: https://proceedings.mlr.press/v162/yeh22a.html PDF: https://proceedings.mlr.press/v162/yeh22a/yeh22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yeh22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Thomas family: Yeh - given: Max family: Sterner - given: Zerlina family: Lai - given: Brandon family: Chuang - given: Alexander family: Ihler editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25490-25500 id: yeh22a issued: date-parts: - 2022 - 6 - 28 firstpage: 25490 lastpage: 25500 published: 2022-06-28 00:00:00 +0000 - title: 'QSFL: A Two-Level Uplink Communication Optimization Framework for Federated Learning' abstract: 'In cross-device Federated Learning (FL), the communication cost of transmitting full-precision models between edge devices and a central server is a significant bottleneck, due to expensive, unreliable, and low-bandwidth wireless connections. As a solution, we propose a novel FL framework named QSFL, towards optimizing FL uplink (client-to-server) communication at both client and model levels. At the client level, we design a Qualification Judgment (QJ) algorithm to sample high-qualification clients to upload models. At the model level, we explore a Sparse Cyclic Sliding Segment (SCSS) algorithm to further compress transmitted models. We prove that QSFL can converge over wall-to-wall time, and develop an optimal hyperparameter searching algorithm based on theoretical analysis to enable QSFL to make the best trade-off between model accuracy and communication cost. Experimental results show that QSFL achieves state-of-the-art compression ratios with marginal model accuracy degradation.' volume: 162 URL: https://proceedings.mlr.press/v162/yi22a.html PDF: https://proceedings.mlr.press/v162/yi22a/yi22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yi22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Liping family: Yi - given: Wang family: Gang - given: Liu family: Xiaoguang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25501-25513 id: yi22a issued: date-parts: - 2022 - 6 - 28 firstpage: 25501 lastpage: 25513 published: 2022-06-28 00:00:00 +0000 - title: 'De novo mass spectrometry peptide sequencing with a transformer model' abstract: 'Tandem mass spectrometry is the only high-throughput method for analyzing the protein content of complex biological samples and is thus the primary technology driving the growth of the field of proteomics. A key outstanding challenge in this field involves identifying the sequence of amino acids -the peptide- responsible for generating each observed spectrum, without making use of prior knowledge in the form of a peptide sequence database. Although various machine learning methods have been developed to address this de novo sequencing problem, challenges that arise when modeling tandem mass spectra have led to complex models that combine multiple neural networks and post-processing steps. We propose a simple yet powerful method for de novo peptide sequencing, Casanovo, that uses a transformer framework to map directly from a sequence of observed peaks (a mass spectrum) to a sequence of amino acids (a peptide). Our experiments show that Casanovo achieves state-of-the-art performance on a benchmark dataset using a standard cross-species evaluation framework which involves testing with spectra with never-before-seen peptide labels. Casanovo not only achieves superior performance but does so at a fraction of the model complexity and inference time required by other methods.' volume: 162 URL: https://proceedings.mlr.press/v162/yilmaz22a.html PDF: https://proceedings.mlr.press/v162/yilmaz22a/yilmaz22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yilmaz22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Melih family: Yilmaz - given: William family: Fondrie - given: Wout family: Bittremieux - given: Sewoong family: Oh - given: William S family: Noble editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25514-25522 id: yilmaz22a issued: date-parts: - 2022 - 6 - 28 firstpage: 25514 lastpage: 25522 published: 2022-06-28 00:00:00 +0000 - title: 'Bayesian Nonparametric Learning for Point Processes with Spatial Homogeneity: A Spatial Analysis of NBA Shot Locations' abstract: 'Basketball shot location data provide valuable summary information regarding players to coaches, sports analysts, fans, statisticians, as well as players themselves. Represented by spatial points, such data are naturally analyzed with spatial point process models. We present a novel nonparametric Bayesian method for learning the underlying intensity surface built upon a combination of Dirichlet process and Markov random field. Our method has the advantage of effectively encouraging local spatial homogeneity when estimating a globally heterogeneous intensity surface. Posterior inferences are performed with an efficient Markov chain Monte Carlo (MCMC) algorithm. Simulation studies show that the inferences are accurate and the method is superior compared to a wide range of competing methods. Application to the shot location data of $20$ representative NBA players in the 2017-2018 regular season offers interesting insights about the shooting patterns of these players. A comparison against the competing method shows that the proposed method can effectively incorporate spatial contiguity into the estimation of intensity surfaces.' volume: 162 URL: https://proceedings.mlr.press/v162/yin22a.html PDF: https://proceedings.mlr.press/v162/yin22a/yin22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yin22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Fan family: Yin - given: Jieying family: Jiao - given: Jun family: Yan - given: Guanyu family: Hu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25523-25551 id: yin22a issued: date-parts: - 2022 - 6 - 28 firstpage: 25523 lastpage: 25551 published: 2022-06-28 00:00:00 +0000 - title: 'Bitwidth Heterogeneous Federated Learning with Progressive Weight Dequantization' abstract: 'In practical federated learning scenarios, the participating devices may have different bitwidths for computation and memory storage by design. However, despite the progress made in device-heterogeneous federated learning scenarios, the heterogeneity in the bitwidth specifications in the hardware has been mostly overlooked. We introduce a pragmatic FL scenario with bitwidth heterogeneity across the participating devices, dubbed as Bitwidth Heterogeneous Federated Learning (BHFL). BHFL brings in a new challenge, that the aggregation of model parameters with different bitwidths could result in severe performance degeneration, especially for high-bitwidth models. To tackle this problem, we propose ProWD framework, which has a trainable weight dequantizer at the central server that progressively reconstructs the low-bitwidth weights into higher bitwidth weights, and finally into full-precision weights. ProWD further selectively aggregates the model parameters to maximize the compatibility across bit-heterogeneous weights. We validate ProWD against relevant FL baselines on the benchmark datasets, using clients with varying bitwidths. Our ProWD largely outperforms the baseline FL algorithms as well as naive approaches (e.g. grouped averaging) under the proposed BHFL scenario.' volume: 162 URL: https://proceedings.mlr.press/v162/yoon22a.html PDF: https://proceedings.mlr.press/v162/yoon22a/yoon22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yoon22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jaehong family: Yoon - given: Geon family: Park - given: Wonyong family: Jeong - given: Sung Ju family: Hwang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25552-25565 id: yoon22a issued: date-parts: - 2022 - 6 - 28 firstpage: 25552 lastpage: 25565 published: 2022-06-28 00:00:00 +0000 - title: 'ShiftAddNAS: Hardware-Inspired Search for More Accurate and Efficient Neural Networks' abstract: 'Neural networks (NNs) with intensive multiplications (e.g., convolutions and transformers) are powerful yet power hungry, impeding their more extensive deployment into resource-constrained edge devices. As such, multiplication-free networks, which follow a common practice in energy-efficient hardware implementation to parameterize NNs with more efficient operators (e.g., bitwise shifts and additions), have gained growing attention. However, multiplication-free networks in general under-perform their vanilla counterparts in terms of the achieved accuracy. To this end, this work advocates hybrid NNs that consist of both powerful yet costly multiplications and efficient yet less powerful operators for marrying the best of both worlds, and proposes ShiftAddNAS, which can automatically search for more accurate and more efficient NNs. Our ShiftAddNAS highlights two enablers. Specifically, it integrates (1) the first hybrid search space that incorporates both multiplication-based and multiplication-free operators for facilitating the development of both accurate and efficient hybrid NNs; and (2) a novel weight sharing strategy that enables effective weight sharing among different operators that follow heterogeneous distributions (e.g., Gaussian for convolutions vs. Laplacian for add operators) and simultaneously leads to a largely reduced supernet size and much better searched networks. Extensive experiments and ablation studies on various models, datasets, and tasks consistently validate the effectiveness of ShiftAddNAS, e.g., achieving up to a +7.7% higher accuracy or a +4.9 better BLEU score as compared to state-of-the-art expert-designed and neural architecture searched NNs, while leading to up to 93% or 69% energy and latency savings, respectively. Codes and pretrained models are available at https://github.com/RICE-EIC/ShiftAddNAS.' volume: 162 URL: https://proceedings.mlr.press/v162/you22a.html PDF: https://proceedings.mlr.press/v162/you22a/you22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-you22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Haoran family: You - given: Baopu family: Li - given: Shi family: Huihong - given: Yonggan family: Fu - given: Yingyan family: Lin editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25566-25580 id: you22a issued: date-parts: - 2022 - 6 - 28 firstpage: 25566 lastpage: 25580 published: 2022-06-28 00:00:00 +0000 - title: 'Molecular Representation Learning via Heterogeneous Motif Graph Neural Networks' abstract: 'We consider feature representation learning problem of molecular graphs. Graph Neural Networks have been widely used in feature representation learning of molecular graphs. However, most existing methods deal with molecular graphs individually while neglecting their connections, such as motif-level relationships. We propose a novel molecular graph representation learning method by constructing a heterogeneous motif graph to address this issue. In particular, we build a heterogeneous motif graph that contains motif nodes and molecular nodes. Each motif node corresponds to a motif extracted from molecules. Then, we propose a Heterogeneous Motif Graph Neural Network (HM-GNN) to learn feature representations for each node in the heterogeneous motif graph. Our heterogeneous motif graph also enables effective multi-task learning, especially for small molecular datasets. To address the potential efficiency issue, we propose to use an edge sampler, which can significantly reduce computational resources usage. The experimental results show that our model consistently outperforms previous state-of-the-art models. Under multi-task settings, the promising performances of our methods on combined datasets shed light on a new learning paradigm for small molecular datasets. Finally, we show that our model achieves similar performances with significantly less computational resources by using our edge sampler.' volume: 162 URL: https://proceedings.mlr.press/v162/yu22a.html PDF: https://proceedings.mlr.press/v162/yu22a/yu22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhaoning family: Yu - given: Hongyang family: Gao editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25581-25594 id: yu22a issued: date-parts: - 2022 - 6 - 28 firstpage: 25581 lastpage: 25594 published: 2022-06-28 00:00:00 +0000 - title: 'Understanding Robust Overfitting of Adversarial Training and Beyond' abstract: 'Robust overfitting widely exists in adversarial training of deep networks. The exact underlying reasons for this are still not completely understood. Here, we explore the causes of robust overfitting by comparing the data distribution of non-overfit (weak adversary) and overfitted (strong adversary) adversarial training, and observe that the distribution of the adversarial data generated by weak adversary mainly contain small-loss data. However, the adversarial data generated by strong adversary is more diversely distributed on the large-loss data and the small-loss data. Given these observations, we further designed data ablation adversarial training and identify that some small-loss data which are not worthy of the adversary strength cause robust overfitting in the strong adversary mode. To relieve this issue, we propose minimum loss constrained adversarial training (MLCAT): in a minibatch, we learn large-loss data as usual, and adopt additional measures to increase the loss of the small-loss data. Technically, MLCAT hinders data fitting when they become easy to learn to prevent robust overfitting; philosophically, MLCAT reflects the spirit of turning waste into treasure and making the best use of each adversarial data; algorithmically, we designed two realizations of MLCAT, and extensive experiments demonstrate that MLCAT can eliminate robust overfitting and further boost adversarial robustness.' volume: 162 URL: https://proceedings.mlr.press/v162/yu22b.html PDF: https://proceedings.mlr.press/v162/yu22b/yu22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yu22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Chaojian family: Yu - given: Bo family: Han - given: Li family: Shen - given: Jun family: Yu - given: Chen family: Gong - given: Mingming family: Gong - given: Tongliang family: Liu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25595-25610 id: yu22b issued: date-parts: - 2022 - 6 - 28 firstpage: 25595 lastpage: 25610 published: 2022-06-28 00:00:00 +0000 - title: 'How to Leverage Unlabeled Data in Offline Reinforcement Learning' abstract: 'Offline reinforcement learning (RL) can learn control policies from static datasets but, like standard RL methods, it requires reward annotations for every transition. In many cases, labeling large datasets with rewards may be costly, especially if those rewards must be provided by human labelers, while collecting diverse unlabeled data might be comparatively inexpensive. How can we best leverage such unlabeled data in offline RL? One natural solution is to learn a reward function from the labeled data and use it to label the unlabeled data. In this paper, we find that, perhaps surprisingly, a much simpler method that simply applies zero rewards to unlabeled data leads to effective data sharing both in theory and in practice, without learning any reward model at all. While this approach might seem strange (and incorrect) at first, we provide extensive theoretical and empirical analysis that illustrates how it trades off reward bias, sample complexity and distributional shift, often leading to good results. We characterize conditions under which this simple strategy is effective, and further show that extending it with a simple reweighting approach can further alleviate the bias introduced by using incorrect reward labels. Our empirical evaluation confirms these findings in simulated robotic locomotion, navigation, and manipulation settings.' volume: 162 URL: https://proceedings.mlr.press/v162/yu22c.html PDF: https://proceedings.mlr.press/v162/yu22c/yu22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yu22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tianhe family: Yu - given: Aviral family: Kumar - given: Yevgen family: Chebotar - given: Karol family: Hausman - given: Chelsea family: Finn - given: Sergey family: Levine editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25611-25635 id: yu22c issued: date-parts: - 2022 - 6 - 28 firstpage: 25611 lastpage: 25635 published: 2022-06-28 00:00:00 +0000 - title: 'Reachability Constrained Reinforcement Learning' abstract: 'Constrained reinforcement learning (CRL) has gained significant interest recently, since safety constraints satisfaction is critical for real-world problems. However, existing CRL methods constraining discounted cumulative costs generally lack rigorous definition and guarantee of safety. In contrast, in the safe control research, safety is defined as persistently satisfying certain state constraints. Such persistent safety is possible only on a subset of the state space, called feasible set, where an optimal largest feasible set exists for a given environment. Recent studies incorporate feasible sets into CRL with energy-based methods such as control barrier function (CBF), safety index (SI), and leverage prior conservative estimations of feasible sets, which harms the performance of the learned policy. To deal with this problem, this paper proposes the reachability CRL (RCRL) method by using reachability analysis to establish the novel self-consistency condition and characterize the feasible sets. The feasible sets are represented by the safety value function, which is used as the constraint in CRL. We use the multi-time scale stochastic approximation theory to prove that the proposed algorithm converges to a local optimum, where the largest feasible set can be guaranteed. Empirical results on different benchmarks validate the learned feasible set, the policy performance, and constraint satisfaction of RCRL, compared to CRL and safe control baselines.' volume: 162 URL: https://proceedings.mlr.press/v162/yu22d.html PDF: https://proceedings.mlr.press/v162/yu22d/yu22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yu22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Dongjie family: Yu - given: Haitong family: Ma - given: Shengbo family: Li - given: Jianyu family: Chen editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25636-25655 id: yu22d issued: date-parts: - 2022 - 6 - 28 firstpage: 25636 lastpage: 25655 published: 2022-06-28 00:00:00 +0000 - title: 'Topology-Aware Network Pruning using Multi-stage Graph Embedding and Reinforcement Learning' abstract: 'Model compression is an essential technique for deploying deep neural networks (DNNs) on power and memory-constrained resources. However, existing model-compression methods often rely on human expertise and focus on parameters’ local importance, ignoring the rich topology information within DNNs. In this paper, we propose a novel multi-stage graph embedding technique based on graph neural networks (GNNs) to identify DNN topologies and use reinforcement learning (RL) to find a suitable compression policy. We performed resource-constrained (i.e., FLOPs) channel pruning and compared our approach with state-of-the-art model compression methods. We evaluated our method on various models from typical to mobile-friendly networks, such as ResNet family, VGG-16, MobileNet-v1/v2, and ShuffleNet. Results show that our method can achieve higher compression ratios with a minimal fine-tuning cost yet yields outstanding and competitive performance.' volume: 162 URL: https://proceedings.mlr.press/v162/yu22e.html PDF: https://proceedings.mlr.press/v162/yu22e/yu22e.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yu22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sixing family: Yu - given: Arya family: Mazaheri - given: Ali family: Jannesari editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25656-25667 id: yu22e issued: date-parts: - 2022 - 6 - 28 firstpage: 25656 lastpage: 25667 published: 2022-06-28 00:00:00 +0000 - title: 'The Combinatorial Brain Surgeon: Pruning Weights That Cancel One Another in Neural Networks' abstract: 'Neural networks tend to achieve better accuracy with training if they are larger {—} even if the resulting models are overparameterized. Nevertheless, carefully removing such excess of parameters before, during, or after training may also produce models with similar or even improved accuracy. In many cases, that can be curiously achieved by heuristics as simple as removing a percentage of the weights with the smallest absolute value {—} even though absolute value is not a perfect proxy for weight relevance. With the premise that obtaining significantly better performance from pruning depends on accounting for the combined effect of removing multiple weights, we revisit one of the classic approaches for impact-based pruning: the Optimal Brain Surgeon (OBS). We propose a tractable heuristic for solving the combinatorial extension of OBS, in which we select weights for simultaneous removal, and we combine it with a single-pass systematic update of unpruned weights. Our selection method outperforms other methods for high sparsity, and the single-pass weight update is also advantageous if applied after those methods.' volume: 162 URL: https://proceedings.mlr.press/v162/yu22f.html PDF: https://proceedings.mlr.press/v162/yu22f/yu22f.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yu22f.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xin family: Yu - given: Thiago family: Serra - given: Srikumar family: Ramalingam - given: Shandian family: Zhe editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25668-25683 id: yu22f issued: date-parts: - 2022 - 6 - 28 firstpage: 25668 lastpage: 25683 published: 2022-06-28 00:00:00 +0000 - title: 'GraphFM: Improving Large-Scale GNN Training via Feature Momentum' abstract: 'Training of graph neural networks (GNNs) for large-scale node classification is challenging. A key difficulty lies in obtaining accurate hidden node representations while avoiding the neighborhood explosion problem. Here, we propose a new technique, named feature momentum (FM), that uses a momentum step to incorporate historical embeddings when updating feature representations. We develop two specific algorithms, known as GraphFM-IB and GraphFM-OB, that consider in-batch and out-of-batch data, respectively. GraphFM-IB applies FM to in-batch sampled data, while GraphFM-OB applies FM to out-of-batch data that are 1-hop neighborhood of in-batch data. We provide a convergence analysis for GraphFM-IB and some theoretical insight for GraphFM-OB. Empirically, we observe that GraphFM-IB can effectively alleviate the neighborhood explosion problem of existing methods. In addition, GraphFM-OB achieves promising performance on multiple large-scale graph datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/yu22g.html PDF: https://proceedings.mlr.press/v162/yu22g/yu22g.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yu22g.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Haiyang family: Yu - given: Limei family: Wang - given: Bokun family: Wang - given: Meng family: Liu - given: Tianbao family: Yang - given: Shuiwang family: Ji editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25684-25701 id: yu22g issued: date-parts: - 2022 - 6 - 28 firstpage: 25684 lastpage: 25701 published: 2022-06-28 00:00:00 +0000 - title: 'Latent Diffusion Energy-Based Model for Interpretable Text Modelling' abstract: 'Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in generative modeling. Fueled by its flexibility in the formulation and strong modeling power of the latent space, recent works built upon it have made interesting attempts aiming at the interpretability of text modeling. However, latent space EBMs also inherit some flaws from EBMs in data space; the degenerate MCMC sampling quality in practice can lead to poor generation quality and instability in training, especially on data with complex latent structures. Inspired by the recent efforts that leverage diffusion recovery likelihood learning as a cure for the sampling issue, we introduce a novel symbiosis between the diffusion models and latent space EBMs in a variational learning framework, coined as the latent diffusion energy-based model. We develop a geometric clustering-based regularization jointly with the information bottleneck to further improve the quality of the learned latent space. Experiments on several challenging tasks demonstrate the superior performance of our model on interpretable text modeling over strong counterparts.' volume: 162 URL: https://proceedings.mlr.press/v162/yu22h.html PDF: https://proceedings.mlr.press/v162/yu22h/yu22h.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yu22h.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Peiyu family: Yu - given: Sirui family: Xie - given: Xiaojian family: Ma - given: Baoxiong family: Jia - given: Bo family: Pang - given: Ruiqi family: Gao - given: Yixin family: Zhu - given: Song-Chun family: Zhu - given: Ying Nian family: Wu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25702-25720 id: yu22h issued: date-parts: - 2022 - 6 - 28 firstpage: 25702 lastpage: 25720 published: 2022-06-28 00:00:00 +0000 - title: 'Predicting Out-of-Distribution Error with the Projection Norm' abstract: 'We propose a metric—Projection Norm—to predict a model’s performance on out-of-distribution (OOD) data without access to ground truth labels. Projection Norm first uses model predictions to pseudo-label test samples and then trains a new model on the pseudo-labels. The more the new model’s parameters differ from an in-distribution model, the greater the predicted OOD error. Empirically, our approach outperforms existing methods on both image and text classification tasks and across different network architectures. Theoretically, we connect our approach to a bound on the test error for overparameterized linear models. Furthermore, we find that Projection Norm is the only approach that achieves non-trivial detection performance on adversarial examples. Our code is available at \url{https://github.com/yaodongyu/ProjNorm}.' volume: 162 URL: https://proceedings.mlr.press/v162/yu22i.html PDF: https://proceedings.mlr.press/v162/yu22i/yu22i.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yu22i.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yaodong family: Yu - given: Zitong family: Yang - given: Alexander family: Wei - given: Yi family: Ma - given: Jacob family: Steinhardt editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25721-25746 id: yu22i issued: date-parts: - 2022 - 6 - 28 firstpage: 25721 lastpage: 25746 published: 2022-06-28 00:00:00 +0000 - title: 'Robust Task Representations for Offline Meta-Reinforcement Learning via Contrastive Learning' abstract: 'We study offline meta-reinforcement learning, a practical reinforcement learning paradigm that learns from offline data to adapt to new tasks. The distribution of offline data is determined jointly by the behavior policy and the task. Existing offline meta-reinforcement learning algorithms cannot distinguish these factors, making task representations unstable to the change of behavior policies. To address this problem, we propose a contrastive learning framework for task representations that are robust to the distribution mismatch of behavior policies in training and test. We design a bi-level encoder structure, use mutual information maximization to formalize task representation learning, derive a contrastive learning objective, and introduce several approaches to approximate the true distribution of negative pairs. Experiments on a variety of offline meta-reinforcement learning benchmarks demonstrate the advantages of our method over prior methods, especially on the generalization to out-of-distribution behavior policies.' volume: 162 URL: https://proceedings.mlr.press/v162/yuan22a.html PDF: https://proceedings.mlr.press/v162/yuan22a/yuan22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yuan22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Haoqi family: Yuan - given: Zongqing family: Lu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25747-25759 id: yuan22a issued: date-parts: - 2022 - 6 - 28 firstpage: 25747 lastpage: 25759 published: 2022-06-28 00:00:00 +0000 - title: 'Provable Stochastic Optimization for Global Contrastive Learning: Small Batch Does Not Harm Performance' abstract: 'In this paper, we study contrastive learning from an optimization perspective, aiming to analyze and address a fundamental issue of existing contrastive learning methods that either rely on a large batch size or a large dictionary of feature vectors. We consider a global objective for contrastive learning, which contrasts each positive pair with all negative pairs for an anchor point. From the optimization perspective, we explain why existing methods such as SimCLR require a large batch size in order to achieve a satisfactory result. In order to remove such requirement, we propose a memory-efficient Stochastic Optimization algorithm for solving the Global objective of Contrastive Learning of Representations, named SogCLR. We show that its optimization error is negligible under a reasonable condition after a sufficient number of iterations or is diminishing for a slightly different global contrastive objective. Empirically, we demonstrate that SogCLR with small batch size (e.g., 256) can achieve similar performance as SimCLR with large batch size (e.g., 8192) on self-supervised learning task on ImageNet-1K. We also attempt to show that the proposed optimization technique is generic and can be applied to solving other contrastive losses, e.g., two-way contrastive losses for bimodal contrastive learning. The proposed method is implemented in our open-sourced library LibAUC (www.libauc.org).' volume: 162 URL: https://proceedings.mlr.press/v162/yuan22b.html PDF: https://proceedings.mlr.press/v162/yuan22b/yuan22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yuan22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhuoning family: Yuan - given: Yuexin family: Wu - given: Zi-Hao family: Qiu - given: Xianzhi family: Du - given: Lijun family: Zhang - given: Denny family: Zhou - given: Tianbao family: Yang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25760-25782 id: yuan22b issued: date-parts: - 2022 - 6 - 28 firstpage: 25760 lastpage: 25782 published: 2022-06-28 00:00:00 +0000 - title: 'Neural Tangent Kernel Empowered Federated Learning' abstract: 'Federated learning (FL) is a privacy-preserving paradigm where multiple participants jointly solve a machine learning problem without sharing raw data. Unlike traditional distributed learning, a unique characteristic of FL is statistical heterogeneity, namely, data distributions across participants are different from each other. Meanwhile, recent advances in the interpretation of neural networks have seen a wide use of neural tangent kernels (NTKs) for convergence analyses. In this paper, we propose a novel FL paradigm empowered by the NTK framework. The paradigm addresses the challenge of statistical heterogeneity by transmitting update data that are more expressive than those of the conventional FL paradigms. Specifically, sample-wise Jacobian matrices, rather than model weights/gradients, are uploaded by participants. The server then constructs an empirical kernel matrix to update a global model without explicitly performing gradient descent. We further develop a variant with improved communication efficiency and enhanced privacy. Numerical results show that the proposed paradigm can achieve the same accuracy while reducing the number of communication rounds by an order of magnitude compared to federated averaging.' volume: 162 URL: https://proceedings.mlr.press/v162/yue22a.html PDF: https://proceedings.mlr.press/v162/yue22a/yue22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yue22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Kai family: Yue - given: Richeng family: Jin - given: Ryan family: Pilgrim - given: Chau-Wai family: Wong - given: Dror family: Baron - given: Huaiyu family: Dai editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25783-25803 id: yue22a issued: date-parts: - 2022 - 6 - 28 firstpage: 25783 lastpage: 25803 published: 2022-06-28 00:00:00 +0000 - title: 'Time Is MattEr: Temporal Self-supervision for Video Transformers' abstract: 'Understanding temporal dynamics of video is an essential aspect of learning better video representations. Recently, transformer-based architectural designs have been extensively explored for video tasks due to their capability to capture long-term dependency of input sequences. However, we found that these Video Transformers are still biased to learn spatial dynamics rather than temporal ones, and debiasing the spurious correlation is critical for their performance. Based on the observations, we design simple yet effective self-supervised tasks for video models to learn temporal dynamics better. Specifically, for debiasing the spatial bias, our method learns the temporal order of video frames as extra self-supervision and enforces the randomly shuffled frames to have low-confidence outputs. Also, our method learns the temporal flow direction of video tokens among consecutive frames for enhancing the correlation toward temporal dynamics. Under various video action recognition tasks, we demonstrate the effectiveness of our method and its compatibility with state-of-the-art Video Transformers.' volume: 162 URL: https://proceedings.mlr.press/v162/yun22a.html PDF: https://proceedings.mlr.press/v162/yun22a/yun22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-yun22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sukmin family: Yun - given: Jaehyung family: Kim - given: Dongyoon family: Han - given: Hwanjun family: Song - given: Jung-Woo family: Ha - given: Jinwoo family: Shin editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25804-25816 id: yun22a issued: date-parts: - 2022 - 6 - 28 firstpage: 25804 lastpage: 25816 published: 2022-06-28 00:00:00 +0000 - title: 'Pure Noise to the Rescue of Insufficient Data: Improving Imbalanced Classification by Training on Random Noise Images' abstract: 'Despite remarkable progress on visual recognition tasks, deep neural-nets still struggle to generalize well when training data is scarce or highly imbalanced, rendering them extremely vulnerable to real-world examples. In this paper, we present a surprisingly simple yet highly effective method to mitigate this limitation: using pure noise images as additional training data. Unlike the common use of additive noise or adversarial noise for data augmentation, we propose an entirely different perspective by directly training on pure random noise images. We present a new Distribution-Aware Routing Batch Normalization layer (DAR-BN), which enables training on pure noise images in addition to natural images within the same network. This encourages generalization and suppresses overfitting. Our proposed method significantly improves imbalanced classification performance, obtaining state-of-the-art results on a large variety of long-tailed image classification datasets (CIFAR-10-LT, CIFAR-100-LT, ImageNet-LT, Places-LT, and CelebA-5). Furthermore, our method is extremely simple and easy to use as a general new augmentation tool (on top of existing augmentations), and can be incorporated in any training scheme. It does not require any specialized data generation or training procedures, thus keeping training fast and efficient.' volume: 162 URL: https://proceedings.mlr.press/v162/zada22a.html PDF: https://proceedings.mlr.press/v162/zada22a/zada22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zada22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Shiran family: Zada - given: Itay family: Benou - given: Michal family: Irani editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25817-25833 id: zada22a issued: date-parts: - 2022 - 6 - 28 firstpage: 25817 lastpage: 25833 published: 2022-06-28 00:00:00 +0000 - title: 'Adaptive Conformal Predictions for Time Series' abstract: 'Uncertainty quantification of predictive models is crucial in decision-making problems. Conformal prediction is a general and theoretically sound answer. However, it requires exchangeable data, excluding time series. While recent works tackled this issue, we argue that Adaptive Conformal Inference (ACI, Gibbs & Cand{è}s, 2021), developed for distribution-shift time series, is a good procedure for time series with general dependency. We theoretically analyse the impact of the learning rate on its efficiency in the exchangeable and auto-regressive case. We propose a parameter-free method, AgACI, that adaptively builds upon ACI based on online expert aggregation. We lead extensive fair simulations against competing methods that advocate for ACI’s use in time series. We conduct a real case study: electricity price forecasting. The proposed aggregation algorithm provides efficient prediction intervals for day-ahead forecasting. All the code and data to reproduce the experiments are made available on GitHub.' volume: 162 URL: https://proceedings.mlr.press/v162/zaffran22a.html PDF: https://proceedings.mlr.press/v162/zaffran22a/zaffran22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zaffran22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Margaux family: Zaffran - given: Olivier family: Feron - given: Yannig family: Goude - given: Julie family: Josse - given: Aymeric family: Dieuleveut editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25834-25866 id: zaffran22a issued: date-parts: - 2022 - 6 - 28 firstpage: 25834 lastpage: 25866 published: 2022-06-28 00:00:00 +0000 - title: 'Actor-Critic based Improper Reinforcement Learning' abstract: 'We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process, and wishes to combine them optimally to produce a potentially new controller that can outperform each of the base ones. This can be useful in tuning across controllers, learnt possibly in mismatched or simulated environments, to obtain a good controller for a given target environment with relatively few trials. Towards this, we propose two algorithms: (1) a Policy Gradient-based approach; and (2) an algorithm that can switch between a simple Actor-Critic (AC) based scheme and a Natural Actor-Critic (NAC) scheme depending on the available information. Both algorithms operate over a class of improper mixtures of the given controllers. For the first case, we derive convergence rate guarantees assuming access to a gradient oracle. For the AC-based approach we provide convergence rate guarantees to a stationary point in the basic AC case and to a global optimum in the NAC case. Numerical results on (i) the standard control theoretic benchmark of stabilizing an inverted pendulum; and (ii) a constrained queueing task show that our improper policy optimization algorithm can stabilize the system even when the base policies at its disposal are unstable.' volume: 162 URL: https://proceedings.mlr.press/v162/zaki22a.html PDF: https://proceedings.mlr.press/v162/zaki22a/zaki22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zaki22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mohammadi family: Zaki - given: Avi family: Mohan - given: Aditya family: Gopalan - given: Shie family: Mannor editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25867-25919 id: zaki22a issued: date-parts: - 2022 - 6 - 28 firstpage: 25867 lastpage: 25919 published: 2022-06-28 00:00:00 +0000 - title: 'Stabilizing Q-learning with Linear Architectures for Provable Efficient Learning' abstract: 'The Q-learning algorithm is a simple, fundamental and practically very effective reinforcement learning algorithm. However, the basic protocol can exhibit an unstable behavior when implemented even with simple linear function approximation. While tools like target networks and experience replay are often implemented to stabilize the learning process, the individual contribution of each of these mechanisms is not well understood theoretically. This work proposes an exploration variant of the basic Q-learning protocol with linear function approximation. Our modular analysis illustrates the role played by each algorithmic tool that we adopt: a second order update rule, a set of target networks, and a mechanism akin to experience replay. Together, they enable state of the art regret bounds on linear MDPs while preserving the most prominent feature of the algorithm, namely a space complexity independent of the number of steps elapsed. Furthermore, we show that the performance of the algorithm degrades very gracefully under a new, more permissive notion of approximation error. Finally, the algorithm partially inherits problem dependent regret bounds, function of the number of ‘effective’ feature dimension.' volume: 162 URL: https://proceedings.mlr.press/v162/zanette22a.html PDF: https://proceedings.mlr.press/v162/zanette22a/zanette22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zanette22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Andrea family: Zanette - given: Martin family: Wainwright editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25920-25954 id: zanette22a issued: date-parts: - 2022 - 6 - 28 firstpage: 25920 lastpage: 25954 published: 2022-06-28 00:00:00 +0000 - title: 'Multi Resolution Analysis (MRA) for Approximate Self-Attention' abstract: 'Transformers have emerged as a preferred model for many tasks in natural langugage processing and vision. Recent efforts on training and deploying Transformers more efficiently have identified many strategies to approximate the self-attention matrix, a key module in a Transformer architecture. Effective ideas include various prespecified sparsity patterns, low-rank basis expansions and combinations thereof. In this paper, we revisit classical Multiresolution Analysis (MRA) concepts such as Wavelets, whose potential value in this setting remains underexplored thus far. We show that simple approximations based on empirical feedback and design choices informed by modern hardware and implementation challenges, eventually yield a MRA-based approach for self-attention with an excellent performance profile across most criteria of interest. We undertake an extensive set of experiments and demonstrate that this multi-resolution scheme outperforms most efficient self-attention proposals and is favorable for both short and long sequences. Code is available at \url{https://github.com/mlpen/mra-attention}.' volume: 162 URL: https://proceedings.mlr.press/v162/zeng22a.html PDF: https://proceedings.mlr.press/v162/zeng22a/zeng22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zeng22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhanpeng family: Zeng - given: Sourav family: Pal - given: Jeffery family: Kline - given: Glenn M family: Fung - given: Vikas family: Singh editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25955-25972 id: zeng22a issued: date-parts: - 2022 - 6 - 28 firstpage: 25955 lastpage: 25972 published: 2022-06-28 00:00:00 +0000 - title: 'Efficient PAC Learning from the Crowd with Pairwise Comparisons' abstract: 'We study crowdsourced PAC learning of threshold function, where the labels are gathered from a pool of annotators some of whom may behave adversarially. This is yet a challenging problem and until recently has computationally and query efficient PAC learning algorithm been established by Awasthi et al. (2017). In this paper, we show that by leveraging the more easily acquired pairwise comparison queries, it is possible to exponentially reduce the label complexity while retaining the overall query complexity and runtime. Our main algorithmic contributions are a comparison-equipped labeling scheme that can faithfully recover the true labels of a small set of instances, and a label-efficient filtering process that in conjunction with the small labeled set can reliably infer the true labels of a large instance set.' volume: 162 URL: https://proceedings.mlr.press/v162/zeng22b.html PDF: https://proceedings.mlr.press/v162/zeng22b/zeng22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zeng22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Shiwei family: Zeng - given: Jie family: Shen editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25973-25993 id: zeng22b issued: date-parts: - 2022 - 6 - 28 firstpage: 25973 lastpage: 25993 published: 2022-06-28 00:00:00 +0000 - title: 'Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts' abstract: 'Most existing methods in vision language pre-training rely on object-centric features extracted through object detection and make fine-grained alignments between the extracted features and texts. It is challenging for these methods to learn relations among multiple objects. To this end, we propose a new method called X-VLM to perform ‘multi-grained vision language pre-training.’ The key to learning multi-grained alignments is to locate visual concepts in the image given the associated texts, and in the meantime align the texts with the visual concepts, where the alignments are in multi-granularity. Experimental results show that X-VLM effectively leverages the learned multi-grained alignments to many downstream vision language tasks and consistently outperforms state-of-the-art methods.' volume: 162 URL: https://proceedings.mlr.press/v162/zeng22c.html PDF: https://proceedings.mlr.press/v162/zeng22c/zeng22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zeng22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yan family: Zeng - given: Xinsong family: Zhang - given: Hang family: Li editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 25994-26009 id: zeng22c issued: date-parts: - 2022 - 6 - 28 firstpage: 25994 lastpage: 26009 published: 2022-06-28 00:00:00 +0000 - title: 'Position Prediction as an Effective Pretraining Strategy' abstract: 'Transformers \cite{transformer} have gained increasing popularity in a wide range of applications, including Natural Language Processing (NLP), Computer Vision and Speech Recognition, because of their powerful representational capacity. However, harnessing this representational capacity effectively requires a large amount of data, strong regularization, or both, to mitigate overfitting. Recently, the power of the Transformer has been unlocked by self-supervised pretraining strategies based on masked autoencoders which rely on reconstructing masked inputs, directly, or contrastively from unmasked content. This pretraining strategy which has been used in BERT models in NLP \cite{bert}, Wav2Vec models in Speech \cite{wv2v2} and, recently, in MAE models in Vision \cite{beit, mae}, forces the model to learn about relationships between the content in different parts of the input using autoencoding related objectives. In this paper, we propose a novel, but surprisingly simple alternative to content reconstruction – that of predicting locations from content, without providing positional information for it. Doing so requires the Transformer to understand the positional relationships between different parts of the input, from their content alone. This amounts to an efficient implementation where the pretext task is a classification problem among all possible positions for each input token. We experiment on both Vision and Speech benchmarks, where our approach brings improvements over strong supervised training baselines and is comparable to modern unsupervised/self-supervised pretraining methods. Our method also enables Transformers trained without position embeddings to outperform ones trained with full position information.' volume: 162 URL: https://proceedings.mlr.press/v162/zhai22a.html PDF: https://proceedings.mlr.press/v162/zhai22a/zhai22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhai22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Shuangfei family: Zhai - given: Navdeep family: Jaitly - given: Jason family: Ramapuram - given: Dan family: Busbridge - given: Tatiana family: Likhomanenko - given: Joseph Y family: Cheng - given: Walter family: Talbott - given: Chen family: Huang - given: Hanlin family: Goh - given: Joshua M family: Susskind editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26010-26027 id: zhai22a issued: date-parts: - 2022 - 6 - 28 firstpage: 26010 lastpage: 26027 published: 2022-06-28 00:00:00 +0000 - title: 'Anytime Information Cascade Popularity Prediction via Self-Exciting Processes' abstract: 'One important aspect of understanding behaviors of information cascades is to be able to accurately predict their popularity, that is, their message counts at any future time. Self-exciting Hawkes processes have been widely adopted for such tasks due to their success in describing cascading behaviors. In this paper, for general, marked Hawkes point processes, we present closed-form expressions for the mean and variance of future event counts, conditioned on observed events. Furthermore, these expressions allow us to develop a predictive approach, namely, Cascade Anytime Size Prediction via self-Exciting Regression model (CASPER), which is specifically tailored to popularity prediction, unlike existing generative approaches {–} based on point processes {–} for the same task. We showcase CASPER’s merits via experiments entailing both synthetic and real-world data, and demonstrate that it considerably improves upon prior works in terms of accuracy, especially for early-stage prediction.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22a.html PDF: https://proceedings.mlr.press/v162/zhang22a/zhang22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xi family: Zhang - given: Akshay family: Aravamudan - given: Georgios C family: Anagnostopoulos editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26028-26047 id: zhang22a issued: date-parts: - 2022 - 6 - 28 firstpage: 26028 lastpage: 26047 published: 2022-06-28 00:00:00 +0000 - title: 'Understanding Clipping for Federated Learning: Convergence and Client-Level Differential Privacy' abstract: 'Providing privacy protection has been one of the primary motivations of Federated Learning (FL). Recently, there has been a line of work on incorporating the formal privacy notion of differential privacy with FL. To guarantee the client-level differential privacy in FL algorithms, the clients’ transmitted model updates have to be clipped before adding privacy noise. Such clipping operation is substantially different from its counterpart of gradient clipping in the centralized differentially private SGD and has not been well-understood. In this paper, we first empirically demonstrate that the clipped FedAvg can perform surprisingly well even with substantial data heterogeneity when training neural networks, which is partly because the clients’ updates become similar for several popular deep architectures. Based on this key observation, we provide the convergence analysis of a differential private (DP) FedAvg algorithm and highlight the relationship between clipping bias and the distribution of the clients’ updates. To the best of our knowledge, this is the first work that rigorously investigates theoretical and empirical issues regarding the clipping operation in FL algorithms.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22b.html PDF: https://proceedings.mlr.press/v162/zhang22b/zhang22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xinwei family: Zhang - given: Xiangyi family: Chen - given: Mingyi family: Hong - given: Steven family: Wu - given: Jinfeng family: Yi editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26048-26067 id: zhang22b issued: date-parts: - 2022 - 6 - 28 firstpage: 26048 lastpage: 26067 published: 2022-06-28 00:00:00 +0000 - title: 'Collaboration of Experts: Achieving 80% Top-1 Accuracy on ImageNet with 100M FLOPs' abstract: 'In this paper, we propose a Collaboration of Experts (CoE) framework to assemble the expertise of multiple networks towards a common goal. Each expert is an individual network with expertise on a unique portion of the dataset, contributing to the collective capacity. Given a sample, delegator selects an expert and simultaneously outputs a rough prediction to trigger potential early termination. For each model in CoE, we propose a novel training algorithm with two major components: weight generation module (WGM) and label generation module (LGM). It fulfills the co-adaptation of experts and delegator. WGM partitions the training data into portions based on delegator via solving a balanced transportation problem, then impels each expert to focus on one portion by reweighting the losses. LGM generates the label to constitute the loss of delegator for expert selection. CoE achieves the state-of-the-art performance on ImageNet, 80.7% top-1 accuracy with 194M FLOPs. Combined with PWLU and CondConv, CoE further boosts the accuracy to 80.0% with only 100M FLOPs for the first time. Furthermore, experiment results on the translation task also demonstrate the strong generalizability of CoE. CoE is hardware-friendly, yielding a 3 6x acceleration compared with existing conditional computation approaches.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22c.html PDF: https://proceedings.mlr.press/v162/zhang22c/zhang22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yikang family: Zhang - given: Zhuo family: Chen - given: Zhao family: Zhong editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26068-26084 id: zhang22c issued: date-parts: - 2022 - 6 - 28 firstpage: 26068 lastpage: 26084 published: 2022-06-28 00:00:00 +0000 - title: 'PDE-Based Optimal Strategy for Unconstrained Online Learning' abstract: 'Unconstrained Online Linear Optimization (OLO) is a practical problem setting to study the training of machine learning models. Existing works proposed a number of potential-based algorithms, but in general the design of these potential functions relies heavily on guessing. To streamline this workflow, we present a framework that generates new potential functions by solving a Partial Differential Equation (PDE). Specifically, when losses are 1-Lipschitz, our framework produces a novel algorithm with anytime regret bound $C\sqrt{T}+||u||\sqrt{2T}[\sqrt{\log(1+||u||/C)}+2]$, where $C$ is a user-specified constant and $u$ is any comparator unknown and unbounded a priori. Such a bound attains an optimal loss-regret trade-off without the impractical doubling trick. Moreover, a matching lower bound shows that the leading order term, including the constant multiplier $\sqrt{2}$, is tight. To our knowledge, the proposed algorithm is the first to achieve such optimalities.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22d.html PDF: https://proceedings.mlr.press/v162/zhang22d/zhang22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhiyu family: Zhang - given: Ashok family: Cutkosky - given: Ioannis family: Paschalidis editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26085-26115 id: zhang22d issued: date-parts: - 2022 - 6 - 28 firstpage: 26085 lastpage: 26115 published: 2022-06-28 00:00:00 +0000 - title: 'Stochastic Continuous Submodular Maximization: Boosting via Non-oblivious Function' abstract: 'In this paper, we revisit Stochastic Continuous Submodular Maximization in both offline and online settings, which can benefit wide applications in machine learning and operations research areas. We present a boosting framework covering gradient ascent and online gradient ascent. The fundamental ingredient of our methods is a novel non-oblivious function $F$ derived from a factor-revealing optimization problem, whose any stationary point provides a $(1-e^{-\gamma})$-approximation to the global maximum of the $\gamma$-weakly DR-submodular objective function $f\in C^{1,1}_L(\mathcal{X})$. Under the offline scenario, we propose a boosting gradient ascent method achieving $(1-e^{-\gamma}-\epsilon^{2})$-approximation after $O(1/\epsilon^2)$ iterations, which improves the $(\frac{\gamma^2}{1+\gamma^2})$ approximation ratio of the classical gradient ascent algorithm. In the online setting, for the first time we consider the adversarial delays for stochastic gradient feedback, under which we propose a boosting online gradient algorithm with the same non-oblivious function $F$. Meanwhile, we verify that this boosting online algorithm achieves a regret of $O(\sqrt{D})$ against a $(1-e^{-\gamma})$-approximation to the best feasible solution in hindsight, where $D$ is the sum of delays of gradient feedback. To the best of our knowledge, this is the first result to obtain $O(\sqrt{T})$ regret against a $(1-e^{-\gamma})$-approximation with $O(1)$ gradient inquiry at each time step, when no delay exists, i.e., $D=T$. Finally, numerical experiments demonstrate the effectiveness of our boosting methods.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22e.html PDF: https://proceedings.mlr.press/v162/zhang22e/zhang22e.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Qixin family: Zhang - given: Zengde family: Deng - given: Zaiyi family: Chen - given: Haoyuan family: Hu - given: Yu family: Yang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26116-26134 id: zhang22e issued: date-parts: - 2022 - 6 - 28 firstpage: 26116 lastpage: 26134 published: 2022-06-28 00:00:00 +0000 - title: 'When and How Mixup Improves Calibration' abstract: 'In many machine learning applications, it is important for the model to provide confidence scores that accurately capture its prediction uncertainty. Although modern learning methods have achieved great success in predictive accuracy, generating calibrated confidence scores remains a major challenge. Mixup, a popular yet simple data augmentation technique based on taking convex combinations of pairs of training examples, has been empirically found to significantly improve confidence calibration across diverse applications. However, when and how Mixup helps calibration is still a mystery. In this paper, we theoretically prove that Mixup improves calibration in high-dimensional settings by investigating natural statistical models. Interestingly, the calibration benefit of Mixup increases as the model capacity increases. We support our theories with experiments on common architectures and datasets. In addition, we study how Mixup improves calibration in semi-supervised learning. While incorporating unlabeled data can sometimes make the model less calibrated, adding Mixup training mitigates this issue and provably improves calibration. Our analysis provides new insights and a framework to understand Mixup and calibration.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22f.html PDF: https://proceedings.mlr.press/v162/zhang22f/zhang22f.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22f.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Linjun family: Zhang - given: Zhun family: Deng - given: Kenji family: Kawaguchi - given: James family: Zou editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26135-26160 id: zhang22f issued: date-parts: - 2022 - 6 - 28 firstpage: 26135 lastpage: 26160 published: 2022-06-28 00:00:00 +0000 - title: 'UAST: Uncertainty-Aware Siamese Tracking' abstract: 'Visual object tracking is basically formulated as target classification and bounding box estimation. Recent anchor-free Siamese trackers rely on predicting the distances to four sides for efficient regression but fail to estimate accurate bounding box in complex scenes. We argue that these approaches lack a clear probabilistic explanation, so it is desirable to model the uncertainty and ambiguity representation of target estimation. To address this issue, this paper presents an Uncertainty-Aware Siamese Tracker (UAST) by developing a novel distribution-based regression formulation with localization uncertainty. We exploit regression vectors to directly represent the discretized probability distribution for four offsets of boxes, which is general, flexible and informative. Based on the resulting distributed representation, our method is able to provide a probabilistic value of uncertainty. Furthermore, considering the high correlation between the uncertainty and regression accuracy, we propose to learn a joint representation head of classification and localization quality for reliable tracking, which also avoids the inconsistency of classification and quality estimation between training and inference. Extensive experiments on several challenging tracking benchmarks demonstrate the effectiveness of UAST and its superiority over other Siamese trackers.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22g.html PDF: https://proceedings.mlr.press/v162/zhang22g/zhang22g.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22g.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Dawei family: Zhang - given: Yanwei family: Fu - given: Zhonglong family: Zheng editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26161-26175 id: zhang22g issued: date-parts: - 2022 - 6 - 28 firstpage: 26161 lastpage: 26175 published: 2022-06-28 00:00:00 +0000 - title: 'Examining Scaling and Transfer of Language Model Architectures for Machine Translation' abstract: 'Natural language understanding and generation models follow one of the two dominant architectural paradigms: language models (LMs) that process concatenated sequences in a single stack of layers, and encoder-decoder models (EncDec) that utilize separate layer stacks for input and output processing. In machine translation, EncDec has long been the favoured approach, but with few studies investigating the performance of LMs. In this work, we thoroughly examine the role of several architectural design choices on the performance of LMs on bilingual, (massively) multilingual and zero-shot translation tasks, under systematic variations of data conditions and model sizes. Our results show that: (i) Different LMs have different scaling properties, where architectural differences often have a significant impact on model performance at small scales, but the performance gap narrows as the number of parameters increases, (ii) Several design choices, including causal masking and language-modeling objectives for the source sequence, have detrimental effects on translation quality, and (iii) When paired with full-visible masking for source sequences, LMs could perform on par with EncDec on supervised bilingual and multilingual translation tasks, and improve greatly on zero-shot directions by facilitating the reduction of off-target translations.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22h.html PDF: https://proceedings.mlr.press/v162/zhang22h/zhang22h.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22h.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Biao family: Zhang - given: Behrooz family: Ghorbani - given: Ankur family: Bapna - given: Yong family: Cheng - given: Xavier family: Garcia - given: Jonathan family: Shen - given: Orhan family: Firat editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26176-26192 id: zhang22h issued: date-parts: - 2022 - 6 - 28 firstpage: 26176 lastpage: 26192 published: 2022-06-28 00:00:00 +0000 - title: 'Revisiting End-to-End Speech-to-Text Translation From Scratch' abstract: 'End-to-end (E2E) speech-to-text translation (ST) often depends on pretraining its encoder and/or decoder using source transcripts via speech recognition or text translation tasks, without which translation performance drops substantially. However, transcripts are not always available, and how significant such pretraining is for E2E ST has rarely been studied in the literature. In this paper, we revisit this question and explore the extent to which the quality of E2E ST trained on speech-translation pairs alone can be improved. We reexamine several techniques proven beneficial to ST previously, and offer a set of best practices that biases a Transformer-based E2E ST system toward training from scratch. Besides, we propose parameterized distance penalty to facilitate the modeling of locality in the self-attention model for speech. On four benchmarks covering 23 languages, our experiments show that, without using any transcripts or pretraining, the proposed system reaches and even outperforms previous studies adopting pretraining, although the gap remains in (extremely) low-resource settings. Finally, we discuss neural acoustic feature modeling, where a neural model is designed to extract acoustic features from raw speech signals directly, with the goal to simplify inductive biases and add freedom to the model in describing speech. For the first time, we demonstrate its feasibility and show encouraging results on ST tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22i.html PDF: https://proceedings.mlr.press/v162/zhang22i/zhang22i.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22i.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Biao family: Zhang - given: Barry family: Haddow - given: Rico family: Sennrich editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26193-26205 id: zhang22i issued: date-parts: - 2022 - 6 - 28 firstpage: 26193 lastpage: 26205 published: 2022-06-28 00:00:00 +0000 - title: 'A Stochastic Multi-Rate Control Framework For Modeling Distributed Optimization Algorithms' abstract: 'In modern machine learning systems, distributed algorithms are deployed across applications to ensure data privacy and optimal utilization of computational resources. This work offers a fresh perspective to model, analyze, and design distributed optimization algorithms through the lens of stochastic multi-rate feedback control. We show that a substantial class of distributed algorithms—including popular Gradient Tracking for decentralized learning, and FedPD and Scaffold for federated learning—can be modeled as a certain discrete-time stochastic feedback-control system, possibly with multiple sampling rates. This key observation allows us to develop a generic framework to analyze the convergence of the entire algorithm class. It also enables one to easily add desirable features such as differential privacy guarantees, or to deal with practical settings such as partial agent participation, communication compression, and imperfect communication in algorithm design and analysis.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22j.html PDF: https://proceedings.mlr.press/v162/zhang22j/zhang22j.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22j.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xinwei family: Zhang - given: Mingyi family: Hong - given: Sairaj family: Dhople - given: Nicola family: Elia editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26206-26222 id: zhang22j issued: date-parts: - 2022 - 6 - 28 firstpage: 26206 lastpage: 26222 published: 2022-06-28 00:00:00 +0000 - title: 'GALAXY: Graph-based Active Learning at the Extreme' abstract: 'Active learning is a label-efficient approach to train highly effective models while interactively selecting only small subsets of unlabelled data for labelling and training. In “open world" settings, the classes of interest can make up a small fraction of the overall dataset – most of the data may be viewed as an out-of-distribution or irrelevant class. This leads to extreme class-imbalance, and our theory and methods focus on this core issue. We propose a new strategy for active learning called GALAXY (Graph-based Active Learning At the eXtrEme), which blends ideas from graph-based active learning and deep learning. GALAXY automatically and adaptively selects more class-balanced examples for labeling than most other methods for active learning. Our theory shows that GALAXY performs a refined form of uncertainty sampling that gathers a much more class-balanced dataset than vanilla uncertainty sampling. Experimentally, we demonstrate GALAXY’s superiority over existing state-of-art deep active learning algorithms in unbalanced vision classification settings generated from popular datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22k.html PDF: https://proceedings.mlr.press/v162/zhang22k/zhang22k.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22k.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jifan family: Zhang - given: Julian family: Katz-Samuels - given: Robert family: Nowak editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26223-26238 id: zhang22k issued: date-parts: - 2022 - 6 - 28 firstpage: 26223 lastpage: 26238 published: 2022-06-28 00:00:00 +0000 - title: 'Fairness Interventions as (Dis)Incentives for Strategic Manipulation' abstract: 'Although machine learning (ML) algorithms are widely used to make decisions about individuals in various domains, concerns have arisen that (1) these algorithms are vulnerable to strategic manipulation and "gaming the algorithm"; and (2) ML decisions may exhibit bias against certain social groups. Existing works have largely examined these as two separate issues, e.g., by focusing on building ML algorithms robust to strategic manipulation, or on training a fair ML algorithm. In this study, we set out to understand the impact they each have on the other, and examine how to characterize fair policies in the presence of strategic behavior. The strategic interaction between a decision maker and individuals (as decision takers) is modeled as a two-stage (Stackelberg) game; when designing an algorithm, the former anticipates the latter may manipulate their features in order to receive more favorable decisions. We analytically characterize the equilibrium strategies of both, and examine how the algorithms and their resulting fairness properties are affected when the decision maker is strategic (anticipates manipulation), as well as the impact of fairness interventions on equilibrium strategies. In particular, we identify conditions under which anticipation of strategic behavior may mitigate/exacerbate unfairness, and conditions under which fairness interventions can serve as (dis)incentives for strategic manipulation.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22l.html PDF: https://proceedings.mlr.press/v162/zhang22l/zhang22l.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22l.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xueru family: Zhang - given: Mohammad Mahdi family: Khalili - given: Kun family: Jin - given: Parinaz family: Naghizadeh - given: Mingyan family: Liu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26239-26264 id: zhang22l issued: date-parts: - 2022 - 6 - 28 firstpage: 26239 lastpage: 26264 published: 2022-06-28 00:00:00 +0000 - title: 'Role-based Multiplex Network Embedding' abstract: 'In recent years, multiplex network embedding has received great attention from researchers. However, existing multiplex network embedding methods neglect structural role information, which can be used to determine the structural similarity between nodes. To overcome this shortcoming, this work proposes a simple, effective, role-based embedding method for multiplex networks, called RMNE. The RMNE uses the structural role information of nodes to preserve the structural similarity between nodes in the entire multiplex network. Specifically, a role-modified random walk is designed to generate node sequences of each node, which can capture both the within-layer neighbors, structural role members, and cross-layer structural role members of a node. Additionally, the variant of RMNE extends the existing collaborative embedding method by unifying the structural role information into our method to obtain the role-based node representations. Finally, the proposed methods were evaluated on the network reconstruction, node classification, link prediction, and multi-class edge classification tasks. The experimental results on eight public, real-world multiplex networks demonstrate that the proposed methods outperform state-of-the-art baseline methods.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22m.html PDF: https://proceedings.mlr.press/v162/zhang22m/zhang22m.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22m.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hegui family: Zhang - given: Gang family: Kou editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26265-26280 id: zhang22m issued: date-parts: - 2022 - 6 - 28 firstpage: 26265 lastpage: 26280 published: 2022-06-28 00:00:00 +0000 - title: 'Dynamic Topic Models for Temporal Document Networks' abstract: 'Dynamic topic models explore the time evolution of topics in temporally accumulative corpora. While existing topic models focus on the dynamics of individual documents, we propose two neural topic models aimed at learning unified topic distributions that incorporate both document dynamics and network structure. For the first model, by adding a time dimension, we propose Time-Aware Optimal Transport, which measures the probability of a link between two differently timestamped documents using their semantic distance. Since the gradually evolving topological structure of network may also influence the establishment of a new link, for the second model, we further design a Temporal Point Process to capture the impact of historical neighbors on the current link formation at the network level. Experiments on four dynamic document networks demonstrate the advantage of our models in jointly modeling document dynamics and network adjacency.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22n.html PDF: https://proceedings.mlr.press/v162/zhang22n/zhang22n.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22n.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Delvin Ce family: Zhang - given: Hady family: Lauw editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26281-26292 id: zhang22n issued: date-parts: - 2022 - 6 - 28 firstpage: 26281 lastpage: 26292 published: 2022-06-28 00:00:00 +0000 - title: 'Personalized Federated Learning via Variational Bayesian Inference' abstract: 'Federated learning faces huge challenges from model overfitting due to the lack of data and statistical diversity among clients. To address these challenges, this paper proposes a novel personalized federated learning method via Bayesian variational inference named pFedBayes. To alleviate the overfitting, weight uncertainty is introduced to neural networks for clients and the server. To achieve personalization, each client updates its local distribution parameters by balancing its construction error over private data and its KL divergence with global distribution from the server. Theoretical analysis gives an upper bound of averaged generalization error and illustrates that the convergence rate of the generalization error is minimax optimal up to a logarithmic factor. Experiments show that the proposed method outperforms other advanced personalized methods on personalized models, e.g., pFedBayes respectively outperforms other SOTA algorithms by 1.25%, 0.42% and 11.71% on MNIST, FMNIST and CIFAR-10 under non-i.i.d. limited data.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22o.html PDF: https://proceedings.mlr.press/v162/zhang22o/zhang22o.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22o.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xu family: Zhang - given: Yinchuan family: Li - given: Wenpeng family: Li - given: Kaiyang family: Guo - given: Yunfeng family: Shao editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26293-26310 id: zhang22o issued: date-parts: - 2022 - 6 - 28 firstpage: 26293 lastpage: 26310 published: 2022-06-28 00:00:00 +0000 - title: 'Federated Learning with Label Distribution Skew via Logits Calibration' abstract: 'Traditional federated optimization methods perform poorly with heterogeneous data (i.e. , accuracy reduction), especially for highly skewed data. In this paper, we investigate the label distribution skew in FL, where the distribution of labels varies across clients. First, we investigate the label distribution skew from a statistical view. We demonstrate both theoretically and empirically that previous methods based on softmax cross-entropy are not suitable, which can result in local models heavily overfitting to minority classes and missing classes. Additionally, we theoretically introduce a deviation bound to measure the deviation of the gradient after local update. At last, we propose FedLC (\textbf{Fed}erated learning via \textbf{L}ogits \textbf{C}alibration), which calibrates the logits before softmax cross-entropy according to the probability of occurrence of each class. FedLC applies a fine-grained calibrated cross-entropy loss to local update by adding a pairwise label margin. Extensive experiments on federated datasets and real-world datasets demonstrate that FedLC leads to a more accurate global model and much improved performance. Furthermore, integrating other FL methods into our approach can further enhance the performance of the global model.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22p.html PDF: https://proceedings.mlr.press/v162/zhang22p/zhang22p.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22p.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jie family: Zhang - given: Zhiqi family: Li - given: Bo family: Li - given: Jianghe family: Xu - given: Shuang family: Wu - given: Shouhong family: Ding - given: Chao family: Wu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26311-26329 id: zhang22p issued: date-parts: - 2022 - 6 - 28 firstpage: 26311 lastpage: 26329 published: 2022-06-28 00:00:00 +0000 - title: 'Neural Network Weights Do Not Converge to Stationary Points: An Invariant Measure Perspective' abstract: 'This work examines the deep disconnect between existing theoretical analyses of gradient-based algorithms and the practice of training deep neural networks. Specifically, we provide numerical evidence that in large-scale neural network training (e.g., ImageNet + ResNet101, and WT103 + TransformerXL models), the neural network’s weights do not converge to stationary points where the gradient of the loss is zero. Remarkably, however, we observe that even though the weights do not converge to stationary points, the progress in minimizing the loss function halts and training loss stabilizes. Inspired by this observation, we propose a new perspective based on ergodic theory of dynamical systems to explain it. Rather than studying the evolution of weights, we study the evolution of the distribution of weights. We prove convergence of the distribution of weights to an approximate invariant measure, thereby explaining how the training loss can stabilize without weights necessarily converging to stationary points. We further discuss how this perspective can better align optimization theory with empirical observations in machine learning practice.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22q.html PDF: https://proceedings.mlr.press/v162/zhang22q/zhang22q.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22q.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jingzhao family: Zhang - given: Haochuan family: Li - given: Suvrit family: Sra - given: Ali family: Jadbabaie editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26330-26346 id: zhang22q issued: date-parts: - 2022 - 6 - 28 firstpage: 26330 lastpage: 26346 published: 2022-06-28 00:00:00 +0000 - title: 'Beyond Worst-Case Analysis in Stochastic Approximation: Moment Estimation Improves Instance Complexity' abstract: 'We study oracle complexity of gradient based methods for stochastic approximation problems. Though in many settings optimal algorithms and tight lower bounds are known for such problems, these optimal algorithms do not achieve the best performance when used in practice. We address this theory-practice gap by focusing on instance-dependent complexity instead of worst case complexity. In particular, we first summarize known instance-dependent complexity results and categorize them into three levels. We identify the domination relation between different levels and propose a fourth instance-dependent bound that dominates existing ones. We then provide a sufficient condition according to which an adaptive algorithm with moment estimation can achieve the proposed bound without knowledge of noise levels. Our proposed algorithm and its analysis provide a theoretical justification for the success of moment estimation as it achieves improved instance complexity.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22r.html PDF: https://proceedings.mlr.press/v162/zhang22r/zhang22r.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22r.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jingzhao family: Zhang - given: Hongzhou family: Lin - given: Subhro family: Das - given: Suvrit family: Sra - given: Ali family: Jadbabaie editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26347-26361 id: zhang22r issued: date-parts: - 2022 - 6 - 28 firstpage: 26347 lastpage: 26361 published: 2022-06-28 00:00:00 +0000 - title: 'Deep and Flexible Graph Neural Architecture Search' abstract: 'Graph neural networks (GNNs) have been intensively applied to various graph-based applications. Despite their success, designing good GNN architectures is non-trivial, which heavily relies on lots of human efforts and domain knowledge. Although several attempts have been made in graph neural architecture search, they suffer from the following limitations: 1) fixed pipeline pattern of propagation (P) and (T) transformation operations; 2) restricted pipeline depth of GNN architectures. This paper proposes DFG-NAS, a novel method that searches for deep and flexible GNN architectures. Unlike most existing methods that focus on micro-architecture, DFG-NAS highlights another level of design: the search for macro-architectures of how atomic P and T are integrated and organized into a GNN. Concretely, DFG-NAS proposes a novel-designed search space for the P-T permutations and combinations based on the message-passing dis-aggregation, and defines various mutation strategies and employs the evolutionary algorithm to conduct an efficient and effective search. Empirical studies on four benchmark datasets demonstrate that DFG-NAS could find more powerful architectures than state-of-the-art manual designs and meanwhile are more efficient than the current graph neural architecture search approaches.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22s.html PDF: https://proceedings.mlr.press/v162/zhang22s/zhang22s.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22s.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Wentao family: Zhang - given: Zheyu family: Lin - given: Yu family: Shen - given: Yang family: Li - given: Zhi family: Yang - given: Bin family: Cui editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26362-26374 id: zhang22s issued: date-parts: - 2022 - 6 - 28 firstpage: 26362 lastpage: 26374 published: 2022-06-28 00:00:00 +0000 - title: 'A Langevin-like Sampler for Discrete Distributions' abstract: 'We propose discrete Langevin proposal (DLP), a simple and scalable gradient-based proposal for sampling complex high-dimensional discrete distributions. In contrast to Gibbs sampling-based methods, DLP is able to update all coordinates in parallel in a single step and the magnitude of changes is controlled by a stepsize. This allows a cheap and efficient exploration in the space of high-dimensional and strongly correlated variables. We prove the efficiency of DLP by showing that the asymptotic bias of its stationary distribution is zero for log-quadratic distributions, and is small for distributions that are close to being log-quadratic. With DLP, we develop several variants of sampling algorithms, including unadjusted, Metropolis-adjusted, stochastic and preconditioned versions. DLP outperforms many popular alternatives on a wide variety of tasks, including Ising models, restricted Boltzmann machines, deep energy-based models, binary neural networks and language generation.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22t.html PDF: https://proceedings.mlr.press/v162/zhang22t/zhang22t.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22t.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ruqi family: Zhang - given: Xingchao family: Liu - given: Qiang family: Liu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26375-26396 id: zhang22t issued: date-parts: - 2022 - 6 - 28 firstpage: 26375 lastpage: 26396 published: 2022-06-28 00:00:00 +0000 - title: 'Rich Feature Construction for the Optimization-Generalization Dilemma' abstract: 'There often is a dilemma between ease of optimization and robust out-of-distribution (OoD) generalization. For instance, many OoD methods rely on penalty terms whose optimization is challenging. They are either too strong to optimize reliably or too weak to achieve their goals. We propose to initialize the networks with a rich representation containing a palette of potentially useful features, ready to be used by even simple models. On the one hand, a rich representation provides a good initialization for the optimizer. On the other hand, it also provides an inductive bias that helps OoD generalization. Such a representation is constructed with the Rich Feature Construction (RFC) algorithm, also called the Bonsai algorithm, which consists of a succession of training episodes. During discovery episodes, we craft a multi-objective optimization criterion and its associated datasets in a manner that prevents the network from using the features constructed in the previous iterations. During synthesis episodes, we use knowledge distillation to force the network to simultaneously represent all the previously discovered features. Initializing the networks with Bonsai representations consistently helps six OoD methods achieve top performance on ColoredMNIST benchmark. The same technique substantially outperforms comparable results on the Wilds Camelyon17 task, eliminates the high result variance that plagues other methods, and makes hyperparameter tuning and model selection more reliable.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22u.html PDF: https://proceedings.mlr.press/v162/zhang22u/zhang22u.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22u.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jianyu family: Zhang - given: David family: Lopez-Paz - given: Leon family: Bottou editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26397-26411 id: zhang22u issued: date-parts: - 2022 - 6 - 28 firstpage: 26397 lastpage: 26411 published: 2022-06-28 00:00:00 +0000 - title: 'Generative Flow Networks for Discrete Probabilistic Modeling' abstract: 'We present energy-based generative flow networks (EB-GFN), a novel probabilistic modeling algorithm for high-dimensional discrete data. Building upon the theory of generative flow networks (GFlowNets), we model the generation process by a stochastic data construction policy and thus amortize expensive MCMC exploration into a fixed number of actions sampled from a GFlowNet. We show how GFlowNets can approximately perform large-block Gibbs sampling to mix between modes. We propose a framework to jointly train a GFlowNet with an energy function, so that the GFlowNet learns to sample from the energy distribution, while the energy learns with an approximate MLE objective with negative samples from the GFlowNet. We demonstrate EB-GFN’s effectiveness on various probabilistic modeling tasks. Code is publicly available at https://github.com/zdhNarsil/EB_GFN.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22v.html PDF: https://proceedings.mlr.press/v162/zhang22v/zhang22v.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22v.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Dinghuai family: Zhang - given: Nikolay family: Malkin - given: Zhen family: Liu - given: Alexandra family: Volokhova - given: Aaron family: Courville - given: Yoshua family: Bengio editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26412-26428 id: zhang22v issued: date-parts: - 2022 - 6 - 28 firstpage: 26412 lastpage: 26428 published: 2022-06-28 00:00:00 +0000 - title: 'Neurotoxin: Durable Backdoors in Federated Learning' abstract: 'Federated learning (FL) systems have an inherent vulnerability to adversarial backdoor attacks during training due to their decentralized nature. The goal of the attacker is to implant backdoors in the learned model with poisoned updates such that at test time, the model’s outputs can be fixed to a given target for certain inputs (e.g., if a user types “people from New York” into a mobile keyboard app that uses a backdoored next word prediction model, the model will autocomplete their sentence to “people in New York are rude”). Prior work has shown that backdoors can be inserted in FL, but these backdoors are not durable: they do not remain in the model after the attacker stops uploading poisoned updates because training continues, and in production FL systems an inserted backdoor may not survive until deployment. We propose Neurotoxin, a simple one-line backdoor attack that functions by attacking parameters that are changed less in magnitude during training. We conduct an exhaustive evaluation across ten natural language processing and computer vision tasks and find that we can double the durability of state of the art backdoors by adding a single line with Neurotoxin.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22w.html PDF: https://proceedings.mlr.press/v162/zhang22w/zhang22w.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22w.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhengming family: Zhang - given: Ashwinee family: Panda - given: Linyue family: Song - given: Yaoqing family: Yang - given: Michael family: Mahoney - given: Prateek family: Mittal - given: Ramchandran family: Kannan - given: Joseph family: Gonzalez editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26429-26446 id: zhang22w issued: date-parts: - 2022 - 6 - 28 firstpage: 26429 lastpage: 26446 published: 2022-06-28 00:00:00 +0000 - title: 'Making Linear MDPs Practical via Contrastive Representation Learning' abstract: 'It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations. This motivates much of the recent theoretical study on linear MDPs. However, most approaches require a given representation under unrealistic assumptions about the normalization of the decomposition or introduce unresolved computational challenges in practice. Instead, we consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning via contrastive estimation. The framework also admits confidence-adjusted index algorithms, enabling an efficient and principled approach to incorporating optimism or pessimism in the face of uncertainty. To the best of our knowledge, this provides the first practical representation learning method for linear MDPs that achieves both strong theoretical guarantees and empirical performance. Theoretically, we prove that the proposed algorithm is sample efficient in both the online and offline settings. Empirically, we demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22x.html PDF: https://proceedings.mlr.press/v162/zhang22x/zhang22x.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22x.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tianjun family: Zhang - given: Tongzheng family: Ren - given: Mengjiao family: Yang - given: Joseph family: Gonzalez - given: Dale family: Schuurmans - given: Bo family: Dai editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26447-26466 id: zhang22x issued: date-parts: - 2022 - 6 - 28 firstpage: 26447 lastpage: 26466 published: 2022-06-28 00:00:00 +0000 - title: 'NAFS: A Simple yet Tough-to-beat Baseline for Graph Representation Learning' abstract: 'Recently, graph neural networks (GNNs) have shown prominent performance in graph representation learning by leveraging knowledge from both graph structure and node features. However, most of them have two major limitations. First, GNNs can learn higher-order structural information by stacking more layers but can not deal with large depth due to the over-smoothing issue. Second, it is not easy to apply these methods on large graphs due to the expensive computation cost and high memory usage. In this paper, we present node-adaptive feature smoothing (NAFS), a simple non-parametric method that constructs node representations without parameter learning. NAFS first extracts the features of each node with its neighbors of different hops by feature smoothing, and then adaptively combines the smoothed features. Besides, the constructed node representation can further be enhanced by the ensemble of smoothed features extracted via different smoothing strategies. We conduct experiments on four benchmark datasets on two different application scenarios: node clustering and link prediction. Remarkably, NAFS with feature ensemble outperforms the state-of-the-art GNNs on these tasks and mitigates the aforementioned two limitations of most learning-based GNN counterparts.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22y.html PDF: https://proceedings.mlr.press/v162/zhang22y/zhang22y.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22y.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Wentao family: Zhang - given: Zeang family: Sheng - given: Mingyu family: Yang - given: Yang family: Li - given: Yu family: Shen - given: Zhi family: Yang - given: Bin family: Cui editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26467-26483 id: zhang22y issued: date-parts: - 2022 - 6 - 28 firstpage: 26467 lastpage: 26483 published: 2022-06-28 00:00:00 +0000 - title: 'Correct-N-Contrast: a Contrastive Approach for Improving Robustness to Spurious Correlations' abstract: 'Spurious correlations pose a major challenge for robust machine learning. Models trained with empirical risk minimization (ERM) may learn to rely on correlations between class labels and spurious attributes, leading to poor performance on data groups without these correlations. This is challenging to address when the spurious attribute labels are unavailable. To improve worst-group performance on spuriously correlated data without training attribute labels, we propose Correct-N-Contrast (CNC), a contrastive approach to directly learn representations robust to spurious correlations. As ERM models can be good spurious attribute predictors, CNC works by (1) using a trained ERM model’s outputs to identify samples with the same class but dissimilar spurious features, and (2) training a robust model with contrastive learning to learn similar representations for these samples. To support CNC, we introduce new connections between worst-group error and a representation alignment loss that CNC aims to minimize. We empirically observe that worst-group error closely tracks with alignment loss, and prove that the alignment loss over a class helps upper-bound the class’s worst-group vs. average error gap. On popular benchmarks, CNC reduces alignment loss drastically, and achieves state-of-the-art worst-group accuracy by 3.6% average absolute lift. CNC is also competitive with oracle methods that require group labels.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22z.html PDF: https://proceedings.mlr.press/v162/zhang22z/zhang22z.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22z.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Michael family: Zhang - given: Nimit S family: Sohoni - given: Hongyang R family: Zhang - given: Chelsea family: Finn - given: Christopher family: Re editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26484-26516 id: zhang22z issued: date-parts: - 2022 - 6 - 28 firstpage: 26484 lastpage: 26516 published: 2022-06-28 00:00:00 +0000 - title: 'Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning approach' abstract: 'We present BRIEE, an algorithm for efficient reinforcement learning in Markov Decision Processes with block-structured dynamics (i.e., Block MDPs), where rich observations are generated from a set of unknown latent states. BRIEE interleaves latent states discovery, exploration, and exploitation together, and can provably learn a near-optimal policy with sample complexity scaling polynomially in the number of latent states, actions, and the time horizon, with no dependence on the size of the potentially infinite observation space. Empirically, we show that BRIEE is more sample efficient than the state-of-art Block MDP algorithm HOMER and other empirical RL baselines on challenging rich-observation combination lock problems which require deep exploration.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22aa.html PDF: https://proceedings.mlr.press/v162/zhang22aa/zhang22aa.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22aa.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xuezhou family: Zhang - given: Yuda family: Song - given: Masatoshi family: Uehara - given: Mengdi family: Wang - given: Alekh family: Agarwal - given: Wen family: Sun editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26517-26547 id: zhang22aa issued: date-parts: - 2022 - 6 - 28 firstpage: 26517 lastpage: 26547 published: 2022-06-28 00:00:00 +0000 - title: 'Partial Counterfactual Identification from Observational and Experimental Data' abstract: 'This paper investigates the problem of bounding counterfactual queries from an arbitrary collection of observational and experimental distributions and qualitative knowledge about the underlying data-generating model represented in the form of a causal diagram. We show that all counterfactual distributions in an arbitrary structural causal model (SCM) with discrete observed domains could be generated by a canonical family of SCMs with the same causal diagram where unobserved (exogenous) variables are also discrete, taking values in finite domains. Utilizing the canonical SCMs, we translate the problem of bounding counterfactuals into that of polynomial programming whose solution provides optimal bounds for the counterfactual query. Solving such polynomial programs is in general computationally expensive. We then develop effective Monte Carlo algorithms to approximate optimal bounds from a combination of observational and experimental data. Our algorithms are validated extensively on synthetic and real-world datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22ab.html PDF: https://proceedings.mlr.press/v162/zhang22ab/zhang22ab.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22ab.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Junzhe family: Zhang - given: Jin family: Tian - given: Elias family: Bareinboim editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26548-26558 id: zhang22ab issued: date-parts: - 2022 - 6 - 28 firstpage: 26548 lastpage: 26558 published: 2022-06-28 00:00:00 +0000 - title: 'Set Norm and Equivariant Skip Connections: Putting the Deep in Deep Sets' abstract: 'Permutation invariant neural networks are a promising tool for predictive modeling of set data. We show, however, that existing architectures struggle to perform well when they are deep. In this work, we mathematically and empirically analyze normalization layers and residual connections in the context of deep permutation invariant neural networks. We develop set norm, a normalization tailored for sets, and introduce the “clean path principle” for equivariant residual connections alongside a novel benefit of such connections, the reduction of information loss. Based on our analysis, we propose Deep Sets++ and Set Transformer++, deep models that reach comparable or better performance than their original counterparts on a diverse suite of tasks. We additionally introduce Flow-RBC, a new single-cell dataset and real-world application of permutation invariant prediction. We open-source our data and code here: https://github.com/rajesh-lab/deep_permutation_invariant.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22ac.html PDF: https://proceedings.mlr.press/v162/zhang22ac/zhang22ac.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22ac.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Lily family: Zhang - given: Veronica family: Tozzo - given: John family: Higgins - given: Rajesh family: Ranganath editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26559-26574 id: zhang22ac issued: date-parts: - 2022 - 6 - 28 firstpage: 26559 lastpage: 26574 published: 2022-06-28 00:00:00 +0000 - title: 'Learning to Estimate and Refine Fluid Motion with Physical Dynamics' abstract: 'Extracting information on fluid motion directly from images is challenging. Fluid flow represents a complex dynamic system governed by the Navier-Stokes equations. General optical flow methods are typically designed for rigid body motion, and thus struggle if applied to fluid motion estimation directly. Further, optical flow methods only focus on two consecutive frames without utilising historical temporal information, while the fluid motion (velocity field) can be considered a continuous trajectory constrained by time-dependent partial differential equations (PDEs). This discrepancy has the potential to induce physically inconsistent estimations. Here we propose an unsupervised learning based prediction-correction scheme for fluid flow estimation. An estimate is first given by a PDE-constrained optical flow predictor, which is then refined by a physical based corrector. The proposed approach outperforms optical flow methods and shows competitive results compared to existing supervised learning based methods on a benchmark dataset. Furthermore, the proposed approach can generalize to complex real-world fluid scenarios where ground truth information is effectively unknowable. Finally, experiments demonstrate that the physical corrector can refine flow estimates by mimicking the operator splitting method commonly utilised in fluid dynamical simulation.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22ad.html PDF: https://proceedings.mlr.press/v162/zhang22ad/zhang22ad.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22ad.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mingrui family: Zhang - given: Jianhong family: Wang - given: James B family: Tlhomole - given: Matthew family: Piggott editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26575-26590 id: zhang22ad issued: date-parts: - 2022 - 6 - 28 firstpage: 26575 lastpage: 26590 published: 2022-06-28 00:00:00 +0000 - title: 'A Branch and Bound Framework for Stronger Adversarial Attacks of ReLU Networks' abstract: 'Strong adversarial attacks are important for evaluating the true robustness of deep neural networks. Most existing attacks search in the input space, e.g., using gradient descent, and may miss adversarial examples due to non-convexity. In this work, we systematically search adversarial examples in the activation space of ReLU networks to tackle hard instances where none of the existing adversarial attacks succeed. Unfortunately, searching the activation space typically relies on generic mixed integer programming (MIP) solvers and is limited to small networks and easy problem instances. To improve scalability and practicability, we use branch and bound (BaB) with specialized GPU-based bound propagation methods, and propose a top-down beam-search approach to quickly identify the subspace that may contain adversarial examples. Moreover, we build an adversarial candidates pool using cheap attacks to further assist the search in activation space via diving techniques and a bottom-up large neighborhood search. Our adversarial attack framework, BaB-Attack, opens up a new opportunity for designing novel adversarial attacks not limited to searching the input space, and enables us to borrow techniques from integer programming theory and neural network verification. In experiments, we can successfully generate adversarial examples when existing attacks on input space fail. Compared to off-the-shelf MIP solver based attacks that requires significant computations, we outperform in both success rates and efficiency.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22ae.html PDF: https://proceedings.mlr.press/v162/zhang22ae/zhang22ae.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22ae.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Huan family: Zhang - given: Shiqi family: Wang - given: Kaidi family: Xu - given: Yihan family: Wang - given: Suman family: Jana - given: Cho-Jui family: Hsieh - given: Zico family: Kolter editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26591-26604 id: zhang22ae issued: date-parts: - 2022 - 6 - 28 firstpage: 26591 lastpage: 26604 published: 2022-06-28 00:00:00 +0000 - title: 'A Simple yet Universal Strategy for Online Convex Optimization' abstract: 'Recently, several universal methods have been proposed for online convex optimization, and attain minimax rates for multiple types of convex functions simultaneously. However, they need to design and optimize one surrogate loss for each type of functions, making it difficult to exploit the structure of the problem and utilize existing algorithms. In this paper, we propose a simple strategy for universal online convex optimization, which avoids these limitations. The key idea is to construct a set of experts to process the original online functions, and deploy a meta-algorithm over the linearized losses to aggregate predictions from experts. Specifically, the meta-algorithm is required to yield a second-order bound with excess losses, so that it can leverage strong convexity and exponential concavity to control the meta-regret. In this way, our strategy inherits the theoretical guarantee of any expert designed for strongly convex functions and exponentially concave functions, up to a double logarithmic factor. As a result, we can plug in off-the-shelf online solvers as black-box experts to deliver problem-dependent regret bounds. For general convex functions, it maintains the minimax optimality and also achieves a small-loss bound.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22af.html PDF: https://proceedings.mlr.press/v162/zhang22af/zhang22af.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22af.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Lijun family: Zhang - given: Guanghui family: Wang - given: Jinfeng family: Yi - given: Tianbao family: Yang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26605-26623 id: zhang22af issued: date-parts: - 2022 - 6 - 28 firstpage: 26605 lastpage: 26623 published: 2022-06-28 00:00:00 +0000 - title: 'Low-Precision Stochastic Gradient Langevin Dynamics' abstract: 'While low-precision optimization has been widely used to accelerate deep learning, low-precision sampling remains largely unexplored. As a consequence, sampling is simply infeasible in many large-scale scenarios, despite providing remarkable benefits to generalization and uncertainty estimation for neural networks. In this paper, we provide the first study of low-precision Stochastic Gradient Langevin Dynamics (SGLD), showing that its costs can be significantly reduced without sacrificing performance, due to its intrinsic ability to handle system noise. We prove that the convergence of low-precision SGLD with full-precision gradient accumulators is less affected by the quantization error than its SGD counterpart in the strongly convex setting. To further enable low-precision gradient accumulators, we develop a new quantization function for SGLD that preserves the variance in each update step. We demonstrate that low-precision SGLD achieves comparable performance to full-precision SGLD with only 8 bits on a variety of deep learning tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22ag.html PDF: https://proceedings.mlr.press/v162/zhang22ag/zhang22ag.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22ag.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ruqi family: Zhang - given: Andrew Gordon family: Wilson - given: Christopher family: De Sa editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26624-26644 id: zhang22ag issued: date-parts: - 2022 - 6 - 28 firstpage: 26624 lastpage: 26644 published: 2022-06-28 00:00:00 +0000 - title: 'Expression might be enough: representing pressure and demand for reinforcement learning based traffic signal control' abstract: 'Many studies confirmed that a proper traffic state representation is more important than complex algorithms for the classical traffic signal control (TSC) problem. In this paper, we (1) present a novel, flexible and efficient method, namely advanced max pressure (Advanced-MP), taking both running and queuing vehicles into consideration to decide whether to change current signal phase; (2) inventively design the traffic movement representation with the efficient pressure and effective running vehicles from Advanced-MP, namely advanced traffic state (ATS); and (3) develop a reinforcement learning (RL) based algorithm template, called Advanced-XLight, by combining ATS with the latest RL approaches, and generate two RL algorithms, namely "Advanced-MPLight" and "Advanced-CoLight" from Advanced-XLight. Comprehensive experiments on multiple real-world datasets show that: (1) the Advanced-MP outperforms baseline methods, and it is also efficient and reliable for deployment; and (2) Advanced-MPLight and Advanced-CoLight can achieve the state-of-the-art.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22ah.html PDF: https://proceedings.mlr.press/v162/zhang22ah/zhang22ah.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22ah.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Liang family: Zhang - given: Qiang family: Wu - given: Jun family: Shen - given: Linyuan family: Lü - given: Bo family: Du - given: Jianqing family: Wu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26645-26654 id: zhang22ah issued: date-parts: - 2022 - 6 - 28 firstpage: 26645 lastpage: 26654 published: 2022-06-28 00:00:00 +0000 - title: 'Uncertainty Modeling in Generative Compressed Sensing' abstract: 'Compressed sensing (CS) aims to recover a high-dimensional signal with structural priors from its low-dimensional linear measurements. Inspired by the huge success of deep neural networks in modeling the priors of natural signals, generative neural networks have been recently used to replace the hand-crafted structural priors in CS. However, the reconstruction capability of the generative model is fundamentally limited by the range of its generator, typically a small subset of the signal space of interest. To break this bottleneck and thus reconstruct those out-of-range signals, this paper presents a novel method called CS-BGM that can effectively expands the range of generator. Specifically, CS-BGM introduces uncertainties to the latent variable and parameters of the generator, while adopting the variational inference (VI) and maximum a posteriori (MAP) to infer them. Theoretical analysis demonstrates that expanding the range of generators is necessary for reducing the reconstruction error in generative CS. Extensive experiments show a consistent improvement of CS-BGM over the baselines.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22ai.html PDF: https://proceedings.mlr.press/v162/zhang22ai/zhang22ai.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22ai.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yilang family: Zhang - given: Mengchu family: Xu - given: Xiaojun family: Mao - given: Jian family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26655-26668 id: zhang22ai issued: date-parts: - 2022 - 6 - 28 firstpage: 26655 lastpage: 26668 published: 2022-06-28 00:00:00 +0000 - title: 'Building Robust Ensembles via Margin Boosting' abstract: 'In the context of adversarial robustness, a single model does not usually have enough power to defend against all possible adversarial attacks, and as a result, has sub-optimal robustness. Consequently, an emerging line of work has focused on learning an ensemble of neural networks to defend against adversarial attacks. In this work, we take a principled approach towards building robust ensembles. We view this problem from the perspective of margin-boosting and develop an algorithm for learning an ensemble with maximum margin. Through extensive empirical evaluation on benchmark datasets, we show that our algorithm not only outperforms existing ensembling techniques, but also large models trained in an end-to-end fashion. An important byproduct of our work is a margin-maximizing cross-entropy (MCE) loss, which is a better alternative to the standard cross-entropy (CE) loss. Empirically, we show that replacing the CE loss in state-of-the-art adversarial training techniques with our MCE loss leads to significant performance improvement.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22aj.html PDF: https://proceedings.mlr.press/v162/zhang22aj/zhang22aj.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22aj.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Dinghuai family: Zhang - given: Hongyang family: Zhang - given: Aaron family: Courville - given: Yoshua family: Bengio - given: Pradeep family: Ravikumar - given: Arun Sai family: Suggala editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26669-26692 id: zhang22aj issued: date-parts: - 2022 - 6 - 28 firstpage: 26669 lastpage: 26692 published: 2022-06-28 00:00:00 +0000 - title: 'Revisiting and Advancing Fast Adversarial Training Through The Lens of Bi-Level Optimization' abstract: 'Adversarial training (AT) is a widely recognized defense mechanism to gain the robustness of deep neural networks against adversarial attacks. It is built on min-max optimization (MMO), where the minimizer (i.e., defender) seeks a robust model to minimize the worst-case training loss in the presence of adversarial examples crafted by the maximizer (i.e., attacker). However, the conventional MMO method makes AT hard to scale. Thus, Fast-AT and other recent algorithms attempt to simplify MMO by replacing its maximization step with the single gradient sign-based attack generation step. Although easy to implement, FAST-AT lacks theoretical guarantees, and its empirical performance is unsatisfactory due to the issue of robust catastrophic overfitting when training with strong adversaries. In this paper, we advance Fast-AT from the fresh perspective of bi-level optimization (BLO). We first show that the commonly-used Fast-AT is equivalent to using a stochastic gradient algorithm to solve a linearized BLO problem involving a sign operation. However, the discrete nature of the sign operation makes it difficult to understand the algorithm performance. Inspired by BLO, we design and analyze a new set of robust training algorithms termed Fast Bi-level AT (Fast-BAT), which effectively defends sign-based projected gradient descent (PGD) attacks without using any gradient sign method or explicit robust regularization. In practice, we show that our method yields substantial robustness improvements over multiple baselines across multiple models and datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22ak.html PDF: https://proceedings.mlr.press/v162/zhang22ak/zhang22ak.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22ak.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yihua family: Zhang - given: Guanhua family: Zhang - given: Prashant family: Khanduri - given: Mingyi family: Hong - given: Shiyu family: Chang - given: Sijia family: Liu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26693-26712 id: zhang22ak issued: date-parts: - 2022 - 6 - 28 firstpage: 26693 lastpage: 26712 published: 2022-06-28 00:00:00 +0000 - title: 'Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory' abstract: 'Off-Policy Evaluation (OPE) serves as one of the cornerstones in Reinforcement Learning (RL). Fitted Q Evaluation (FQE) with various function approximators, especially deep neural networks, has gained practical success. While statistical analysis has proved FQE to be minimax-optimal with tabular, linear and several nonparametric function families, its practical performance with more general function approximator is less theoretically understood. We focus on FQE with general differentiable function approximators, making our theory applicable to neural function approximations. We approach this problem using the Z-estimation theory and establish the following results: The FQE estimation error is asymptotically normal with explicit variance determined jointly by the tangent space of the function class at the ground truth, the reward structure, and the distribution shift due to off-policy learning; The finite-sample FQE error bound is dominated by the same variance term, and it can also be bounded by function class-dependent divergence, which measures how the off-policy distribution shift intertwines with the function approximator. In addition, we study bootstrapping FQE estimators for error distribution inference and estimating confidence intervals, accompanied by a Cramer-Rao lower bound that matches our upper bounds. The Z-estimation analysis provides a generalizable theoretical framework for studying off-policy estimation in RL and provides sharp statistical theory for FQE with differentiable function approximators.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22al.html PDF: https://proceedings.mlr.press/v162/zhang22al/zhang22al.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22al.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ruiqi family: Zhang - given: Xuezhou family: Zhang - given: Chengzhuo family: Ni - given: Mengdi family: Wang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26713-26749 id: zhang22al issued: date-parts: - 2022 - 6 - 28 firstpage: 26713 lastpage: 26749 published: 2022-06-28 00:00:00 +0000 - title: 'ROCK: Causal Inference Principles for Reasoning about Commonsense Causality' abstract: 'Commonsense causality reasoning (CCR) aims at identifying plausible causes and effects in natural language descriptions that are deemed reasonable by an average person. Although being of great academic and practical interest, this problem is still shadowed by the lack of a well-posed theoretical framework; existing work usually relies on deep language models wholeheartedly, and is potentially susceptible to confounding co-occurrences. Motivated by classical causal principles, we articulate the central question of CCR and draw parallels between human subjects in observational studies and natural languages to adopt CCR to the potential-outcomes framework, which is the first such attempt for commonsense tasks. We propose a novel framework, ROCK, to Reason O(A)bout Commonsense K(C)ausality, which utilizes temporal signals as incidental supervision, and balances confounding effects using temporal propensities that are analogous to propensity scores. The ROCK implementation is modular and zero-shot, and demonstrates good CCR capabilities.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22am.html PDF: https://proceedings.mlr.press/v162/zhang22am/zhang22am.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22am.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jiayao family: Zhang - given: Hongming family: Zhang - given: Weijie family: Su - given: Dan family: Roth editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26750-26771 id: zhang22am issued: date-parts: - 2022 - 6 - 28 firstpage: 26750 lastpage: 26771 published: 2022-06-28 00:00:00 +0000 - title: 'No-Regret Learning in Time-Varying Zero-Sum Games' abstract: 'Learning from repeated play in a fixed two-player zero-sum game is a classic problem in game theory and online learning. We consider a variant of this problem where the game payoff matrix changes over time, possibly in an adversarial manner. We first present three performance measures to guide the algorithmic design for this problem: 1) the well-studied individual regret, 2) an extension of duality gap, and 3) a new measure called dynamic Nash Equilibrium regret, which quantifies the cumulative difference between the player’s payoff and the minimax game value. Next, we develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under all these three performance measures. These guarantees are adaptive to different non-stationarity measures of the payoff matrices and, importantly, recover the best known results when the payoff matrix is fixed. Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property, along with several novel ingredients specifically designed for the time-varying game setting. Empirical results further validate the effectiveness of our algorithm.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22an.html PDF: https://proceedings.mlr.press/v162/zhang22an/zhang22an.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22an.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Mengxiao family: Zhang - given: Peng family: Zhao - given: Haipeng family: Luo - given: Zhi-Hua family: Zhou editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26772-26808 id: zhang22an issued: date-parts: - 2022 - 6 - 28 firstpage: 26772 lastpage: 26808 published: 2022-06-28 00:00:00 +0000 - title: 'PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance' abstract: 'Large Transformer-based models have exhibited superior performance in various natural language processing and computer vision tasks. However, these models contain enormous amounts of parameters, which restrict their deployment to real-world applications. To reduce the model size, researchers prune these models based on the weights’ importance scores. However, such scores are usually estimated on mini-batches during training, which incurs large variability/uncertainty due to mini-batch sampling and complicated training dynamics. As a result, some crucial weights could be pruned by commonly used pruning methods because of such uncertainty, which makes training unstable and hurts generalization. To resolve this issue, we propose PLATON, which captures the uncertainty of importance scores by upper confidence bound of importance estimation. In particular, for the weights with low importance scores but high uncertainty, PLATON tends to retain them and explores their capacity. We conduct extensive experiments with several Transformer-based models on natural language understanding, question answering and image classification to validate the effectiveness of PLATON. Results demonstrate that PLATON manifests notable improvement under different sparsity levels. Our code is publicly available at https://github.com/QingruZhang/PLATON.' volume: 162 URL: https://proceedings.mlr.press/v162/zhang22ao.html PDF: https://proceedings.mlr.press/v162/zhang22ao/zhang22ao.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhang22ao.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Qingru family: Zhang - given: Simiao family: Zuo - given: Chen family: Liang - given: Alexander family: Bukharin - given: Pengcheng family: He - given: Weizhu family: Chen - given: Tuo family: Zhao editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26809-26823 id: zhang22ao issued: date-parts: - 2022 - 6 - 28 firstpage: 26809 lastpage: 26823 published: 2022-06-28 00:00:00 +0000 - title: 'NysADMM: faster composite convex optimization via low-rank approximation' abstract: 'This paper develops a scalable new algorithm, called NysADMM, to minimize a smooth convex loss function with a convex regularizer. NysADMM accelerates the inexact Alternating Direction Method of Multipliers (ADMM) by constructing a preconditioner for the ADMM subproblem from a randomized low-rank Nyström approximation. NysADMM comes with strong theoretical guarantees: it solves the ADMM subproblem in a constant number of iterations when the rank of the Nyström approximation is the effective dimension of the subproblem regularized Gram matrix. In practice, ranks much smaller than the effective dimension can succeed, so NysADMM uses an adaptive strategy to choose the rank that enjoys analogous guarantees. Numerical experiments on real-world datasets demonstrate that NysADMM can solve important applications, such as the lasso, logistic regression, and support vector machines, in half the time (or less) required by standard solvers. The breadth of problems on which NysADMM beats standard solvers is a surprise: it suggests that ADMM is a dominant paradigm for numerical optimization across a wide range of statistical learning problems that are usually solved with bespoke methods.' volume: 162 URL: https://proceedings.mlr.press/v162/zhao22a.html PDF: https://proceedings.mlr.press/v162/zhao22a/zhao22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhao22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Shipu family: Zhao - given: Zachary family: Frangella - given: Madeleine family: Udell editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26824-26840 id: zhao22a issued: date-parts: - 2022 - 6 - 28 firstpage: 26824 lastpage: 26840 published: 2022-06-28 00:00:00 +0000 - title: 'Toward Compositional Generalization in Object-Oriented World Modeling' abstract: 'Compositional generalization is a critical ability in learning and decision-making. We focus on the setting of reinforcement learning in object-oriented environments to study compositional generalization in world modeling. We (1) formalize the compositional generalization problem with an algebraic approach and (2) study how a world model can achieve that. We introduce a conceptual environment, Object Library, and two instances, and deploy a principled pipeline to measure the generalization ability. Motivated by the formulation, we analyze several methods with exact or no compositional generalization ability using our framework, and design a differentiable approach, Homomorphic Object-oriented World Model (HOWM), that achieves soft but more efficient compositional generalization.' volume: 162 URL: https://proceedings.mlr.press/v162/zhao22b.html PDF: https://proceedings.mlr.press/v162/zhao22b/zhao22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhao22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Linfeng family: Zhao - given: Lingzhi family: Kong - given: Robin family: Walters - given: Lawson L.S. family: Wong editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26841-26864 id: zhao22b issued: date-parts: - 2022 - 6 - 28 firstpage: 26841 lastpage: 26864 published: 2022-06-28 00:00:00 +0000 - title: 'Dynamic Regret of Online Markov Decision Processes' abstract: 'We investigate online Markov Decision Processes (MDPs) with adversarially changing loss functions and known transitions. We choose dynamic regret as the performance measure, defined as the performance difference between the learner and any sequence of feasible changing policies. The measure is strictly stronger than the standard static regret that benchmarks the learner’s performance with a fixed compared policy. We consider three foundational models of online MDPs, including episodic loop-free Stochastic Shortest Path (SSP), episodic SSP, and infinite-horizon MDPs. For the three models, we propose novel online ensemble algorithms and establish their dynamic regret guarantees respectively, in which the results for episodic (loop-free) SSP are provably minimax optimal in terms of time horizon and certain non-stationarity measure.' volume: 162 URL: https://proceedings.mlr.press/v162/zhao22c.html PDF: https://proceedings.mlr.press/v162/zhao22c/zhao22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhao22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Peng family: Zhao - given: Long-Fei family: Li - given: Zhi-Hua family: Zhou editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26865-26894 id: zhao22c issued: date-parts: - 2022 - 6 - 28 firstpage: 26865 lastpage: 26894 published: 2022-06-28 00:00:00 +0000 - title: 'Learning to Solve PDE-constrained Inverse Problems with Graph Networks' abstract: 'Learned graph neural networks (GNNs) have recently been established as fast and accurate alternatives for principled solvers in simulating the dynamics of physical systems. In many application domains across science and engineering, however, we are not only interested in a forward simulation but also in solving inverse problems with constraints defined by a partial differential equation (PDE). Here we explore GNNs to solve such PDE-constrained inverse problems. Given a sparse set of measurements, we are interested in recovering the initial condition or parameters of the PDE. We demonstrate that GNNs combined with autodecoder-style priors are well-suited for these tasks, achieving more accurate estimates of initial conditions or physical parameters than other learned approaches when applied to the wave equation or Navier Stokes equations. We also demonstrate computational speedups of up to 90x using GNNs compared to principled solvers.' volume: 162 URL: https://proceedings.mlr.press/v162/zhao22d.html PDF: https://proceedings.mlr.press/v162/zhao22d/zhao22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhao22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Qingqing family: Zhao - given: David B family: Lindell - given: Gordon family: Wetzstein editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26895-26910 id: zhao22d issued: date-parts: - 2022 - 6 - 28 firstpage: 26895 lastpage: 26910 published: 2022-06-28 00:00:00 +0000 - title: 'Learning from Counterfactual Links for Link Prediction' abstract: 'Learning to predict missing links is important for many graph-based applications. Existing methods were designed to learn the association between observed graph structure and existence of link between a pair of nodes. However, the causal relationship between the two variables was largely ignored for learning to predict links on a graph. In this work, we visit this factor by asking a counterfactual question: "would the link still exist if the graph structure became different from observation?" Its answer, counterfactual links, will be able to augment the graph data for representation learning. To create these links, we employ causal models that consider the information (i.e., learned representations) of node pairs as context, global graph structural properties as treatment, and link existence as outcome. We propose a novel data augmentation-based link prediction method that creates counterfactual links and learns representations from both the observed and counterfactual links. Experiments on benchmark data show that our graph learning method achieves state-of-the-art performance on the task of link prediction.' volume: 162 URL: https://proceedings.mlr.press/v162/zhao22e.html PDF: https://proceedings.mlr.press/v162/zhao22e/zhao22e.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhao22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tong family: Zhao - given: Gang family: Liu - given: Daheng family: Wang - given: Wenhao family: Yu - given: Meng family: Jiang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26911-26926 id: zhao22e issued: date-parts: - 2022 - 6 - 28 firstpage: 26911 lastpage: 26926 published: 2022-06-28 00:00:00 +0000 - title: 'Global Optimization Networks' abstract: 'We consider the problem of estimating a good maximizer of a black-box function given noisy examples. We propose to fit a new type of function called a global optimization network (GON), defined as any composition of an invertible function and a unimodal function, whose unique global maximizer can be inferred in $\mathcal{O}(D)$ time, and used as the estimate. As an example way to construct GON functions, and interesting in its own right, we give new results for specifying multi-dimensional unimodal functions using lattice models with linear inequality constraints. We extend to conditional GONs that find a global maximizer conditioned on specified inputs of other dimensions. Experiments show the GON maximizers are statistically significantly better predictions than those produced by convex fits, GPR, or DNNs, and form more reasonable predictions for real-world problems.' volume: 162 URL: https://proceedings.mlr.press/v162/zhao22f.html PDF: https://proceedings.mlr.press/v162/zhao22f/zhao22f.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhao22f.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Sen family: Zhao - given: Erez family: Louidor - given: Maya family: Gupta editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26927-26957 id: zhao22f issued: date-parts: - 2022 - 6 - 28 firstpage: 26927 lastpage: 26957 published: 2022-06-28 00:00:00 +0000 - title: 'Certified Robustness Against Natural Language Attacks by Causal Intervention' abstract: 'Deep learning models have achieved great success in many fields, yet they are vulnerable to adversarial examples. This paper follows a causal perspective to look into the adversarial vulnerability and proposes Causal Intervention by Semantic Smoothing (CISS), a novel framework towards robustness against natural language attacks. Instead of merely fitting observational data, CISS learns causal effects p(y|do(x)) by smoothing in the latent semantic space to make robust predictions, which scales to deep architectures and avoids tedious construction of noise customized for specific attacks. CISS is provably robust against word substitution attacks, as well as empirically robust even when perturbations are strengthened by unknown attack algorithms. For example, on YELP, CISS surpasses the runner-up by 6.8% in terms of certified robustness against word substitutions, and achieves 80.7% empirical robustness when syntactic attacks are integrated.' volume: 162 URL: https://proceedings.mlr.press/v162/zhao22g.html PDF: https://proceedings.mlr.press/v162/zhao22g/zhao22g.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhao22g.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Haiteng family: Zhao - given: Chang family: Ma - given: Xinshuai family: Dong - given: Anh Tuan family: Luu - given: Zhi-Hong family: Deng - given: Hanwang family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26958-26970 id: zhao22g issued: date-parts: - 2022 - 6 - 28 firstpage: 26958 lastpage: 26970 published: 2022-06-28 00:00:00 +0000 - title: 'Efficient Learning for AlphaZero via Path Consistency' abstract: 'In recent years, deep reinforcement learning have made great breakthroughs on board games. Still, most of the works require huge computational resources for a large scale of environmental interactions or self-play for the games. This paper aims at building powerful models under a limited amount of self-plays which can be utilized by a human throughout the lifetime. We proposes a learning algorithm built on AlphaZero, with its path searching regularised by a path consistency (PC) optimality, i.e., values on one optimal search path should be identical. Thus, the algorithm is shortly named PCZero. In implementation, historical trajectory and scouted search paths by MCTS makes a good balance between exploration and exploitation, which enhances the generalization ability effectively. PCZero obtains $94.1%$ winning rate against the champion of Hex Computer Olympiad in 2015 on $13\times 13$ Hex, much higher than $84.3%$ by AlphaZero. The models consume only $900K$ self-play games, about the amount humans can study in a lifetime. The improvements by PCZero have been also generalized to Othello and Gomoku. Experiments also demonstrate the efficiency of PCZero under offline learning setting.' volume: 162 URL: https://proceedings.mlr.press/v162/zhao22h.html PDF: https://proceedings.mlr.press/v162/zhao22h/zhao22h.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhao22h.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Dengwei family: Zhao - given: Shikui family: Tu - given: Lei family: Xu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26971-26981 id: zhao22h issued: date-parts: - 2022 - 6 - 28 firstpage: 26971 lastpage: 26981 published: 2022-06-28 00:00:00 +0000 - title: 'Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning' abstract: 'How to train deep neural networks (DNNs) to generalize well is a central concern in deep learning, especially for severely overparameterized networks nowadays. In this paper, we propose an effective method to improve the model generalization by additionally penalizing the gradient norm of loss function during optimization. We demonstrate that confining the gradient norm of loss function could help lead the optimizers towards finding flat minima. We leverage the first-order approximation to efficiently implement the corresponding gradient to fit well in the gradient descent framework. In our experiments, we confirm that when using our methods, generalization performance of various models could be improved on different datasets. Also, we show that the recent sharpness-aware minimization method (Foretet al., 2021) is a special, but not the best, case of our method, where the best case of our method could give new state-of-art performance on these tasks. Code is available at https://github.com/zhaoyang-0204/gnp.' volume: 162 URL: https://proceedings.mlr.press/v162/zhao22i.html PDF: https://proceedings.mlr.press/v162/zhao22i/zhao22i.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhao22i.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yang family: Zhao - given: Hao family: Zhang - given: Xiuyuan family: Hu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26982-26992 id: zhao22i issued: date-parts: - 2022 - 6 - 28 firstpage: 26982 lastpage: 26992 published: 2022-06-28 00:00:00 +0000 - title: 'Ripple Attention for Visual Perception with Sub-quadratic Complexity' abstract: 'Transformer architectures are now central to sequence modeling tasks. At its heart is the attention mechanism, which enables effective modeling of long-term dependencies in a sequence. Recently, transformers have been successfully applied in the computer vision domain, where 2D images are first segmented into patches and then treated as 1D sequences. Such linearization, however, impairs the notion of spatial locality in images, which bears important visual clues. To bridge the gap, we propose ripple attention, a sub-quadratic attention mechanism for vision transformers. Built upon the recent kernel-based efficient attention mechanisms, we design a novel dynamic programming algorithm that weights contributions of different tokens to a query with respect to their relative spatial distances in the 2D space in linear observed time. Extensive experiments and analyses demonstrate the effectiveness of ripple attention on various visual tasks.' volume: 162 URL: https://proceedings.mlr.press/v162/zheng22a.html PDF: https://proceedings.mlr.press/v162/zheng22a/zheng22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zheng22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Lin family: Zheng - given: Huijie family: Pan - given: Lingpeng family: Kong editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 26993-27010 id: zheng22a issued: date-parts: - 2022 - 6 - 28 firstpage: 26993 lastpage: 27010 published: 2022-06-28 00:00:00 +0000 - title: 'Linear Complexity Randomized Self-attention Mechanism' abstract: 'Recently, random feature attentions (RFAs) are proposed to approximate the softmax attention in linear time and space complexity by linearizing the exponential kernel. In this paper, we first propose a novel perspective to understand the bias in such approximation by recasting RFAs as self-normalized importance samplers. This perspective further sheds light on an unbiased estimator for the whole softmax attention, called randomized attention (RA). RA constructs positive random features via query-specific distributions and enjoys greatly improved approximation fidelity, albeit exhibiting quadratic complexity. By combining the expressiveness in RA and the efficiency in RFA, we develop a novel linear complexity self-attention mechanism called linear randomized attention (LARA). Extensive experiments across various domains demonstrate that RA and LARA significantly improve the performance of RFAs by a substantial margin.' volume: 162 URL: https://proceedings.mlr.press/v162/zheng22b.html PDF: https://proceedings.mlr.press/v162/zheng22b/zheng22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zheng22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Lin family: Zheng - given: Chong family: Wang - given: Lingpeng family: Kong editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27011-27041 id: zheng22b issued: date-parts: - 2022 - 6 - 28 firstpage: 27011 lastpage: 27041 published: 2022-06-28 00:00:00 +0000 - title: 'Online Decision Transformer' abstract: 'Recent work has shown that offline reinforcement learning (RL) can be formulated as a sequence modeling problem (Chen et al., 2021; Janner et al., 2021) and solved via approaches similar to large-scale language modeling. However, any practical instantiation of RL also involves an online component, where policies pretrained on passive offline datasets are finetuned via task-specific interactions with the environment. We propose Online Decision Transformers (ODT), an RL algorithm based on sequence modeling that blends offline pretraining with online finetuning in a unified framework. Our framework uses sequence-level entropy regularizers in conjunction with autoregressive modeling objectives for sample-efficient exploration and finetuning. Empirically, we show that ODT is competitive with the state-of-the-art in absolute performance on the D4RL benchmark but shows much more significant gains during the finetuning procedure.' volume: 162 URL: https://proceedings.mlr.press/v162/zheng22c.html PDF: https://proceedings.mlr.press/v162/zheng22c/zheng22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zheng22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Qinqing family: Zheng - given: Amy family: Zhang - given: Aditya family: Grover editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27042-27059 id: zheng22c issued: date-parts: - 2022 - 6 - 28 firstpage: 27042 lastpage: 27059 published: 2022-06-28 00:00:00 +0000 - title: 'Learning Efficient and Robust Ordinary Differential Equations via Invertible Neural Networks' abstract: 'Advances in differentiable numerical integrators have enabled the use of gradient descent techniques to learn ordinary differential equations (ODEs), where a flexible function approximator (often a neural network) is used to estimate the system dynamics, given as a time derivative. However, these integrators can be unsatisfactorily slow and unstable when learning systems of ODEs from long sequences. We propose to learn an ODE of interest from data by viewing its dynamics as a vector field related to another base vector field via a diffeomorphism (i.e., a differentiable bijection), represented by an invertible neural network (INN). By learning both the INN and the dynamics of the base ODE, we provide an avenue to offload some of the complexity in modelling the dynamics directly on to the INN. Consequently, by restricting the base ODE to be amenable to integration, we can speed up and improve the robustness of integrating trajectories from the learned system. We demonstrate the efficacy of our method in training and evaluating benchmark ODE systems, as well as within continuous-depth neural networks models. We show that our approach attains speed-ups of up to two orders of magnitude when integrating learned ODEs.' volume: 162 URL: https://proceedings.mlr.press/v162/zhi22a.html PDF: https://proceedings.mlr.press/v162/zhi22a/zhi22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhi22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Weiming family: Zhi - given: Tin family: Lai - given: Lionel family: Ott - given: Edwin V. family: Bonilla - given: Fabio family: Ramos editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27060-27074 id: zhi22a issued: date-parts: - 2022 - 6 - 28 firstpage: 27060 lastpage: 27074 published: 2022-06-28 00:00:00 +0000 - title: 'HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning' abstract: 'In this work we propose a HyperTransformer, a Transformer-based model for supervised and semi-supervised few-shot learning that generates weights of a convolutional neural network (CNN) directly from support samples. Since the dependence of a small generated CNN model on a specific task is encoded by a high-capacity Transformer model, we effectively decouple the complexity of the large task space from the complexity of individual tasks. Our method is particularly effective for small target CNN architectures where learning a fixed universal task-independent embedding is not optimal and better performance is attained when the information about the task can modulate all model parameters. For larger models we discover that generating the last layer alone allows us to produce competitive or better results than those obtained with state-of-the-art methods while being end-to-end differentiable.' volume: 162 URL: https://proceedings.mlr.press/v162/zhmoginov22a.html PDF: https://proceedings.mlr.press/v162/zhmoginov22a/zhmoginov22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhmoginov22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Andrey family: Zhmoginov - given: Mark family: Sandler - given: Maksym family: Vladymyrov editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27075-27098 id: zhmoginov22a issued: date-parts: - 2022 - 6 - 28 firstpage: 27075 lastpage: 27098 published: 2022-06-28 00:00:00 +0000 - title: 'Describing Differences between Text Distributions with Natural Language' abstract: 'How do two distributions of text differ? Humans are slow at answering this, since discovering patterns might require tediously reading through hundreds of samples. We propose to automatically summarize the differences by “learning a natural language hypothesis": given two distributions $D_{0}$ and $D_{1}$, we search for a description that is more often true for $D_{1}$, e.g., “is military-related." To tackle this problem, we fine-tune GPT-3 to propose descriptions with the prompt: “[samples of $D_{0}$] + [samples of $D_{1}$] + the difference between them is \underline{\space\space\space\space}". We then re-rank the descriptions by checking how often they hold on a larger set of samples with a learned verifier. On a benchmark of 54 real-world binary classification tasks, while GPT-3 Curie (13B) only generates a description similar to human annotation 7% of the time, the performance reaches 61% with fine-tuning and re-ranking, and our best system using GPT-3 Davinci (175B) reaches 76%. We apply our system to describe distribution shifts, debug dataset shortcuts, summarize unknown tasks, and label text clusters, and present analyses based on automatically generated descriptions.' volume: 162 URL: https://proceedings.mlr.press/v162/zhong22a.html PDF: https://proceedings.mlr.press/v162/zhong22a/zhong22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhong22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Ruiqi family: Zhong - given: Charlie family: Snell - given: Dan family: Klein - given: Jacob family: Steinhardt editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27099-27116 id: zhong22a issued: date-parts: - 2022 - 6 - 28 firstpage: 27099 lastpage: 27116 published: 2022-06-28 00:00:00 +0000 - title: 'Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets' abstract: 'We study episodic two-player zero-sum Markov games (MGs) in the offline setting, where the goal is to find an approximate Nash equilibrium (NE) policy pair based on a dataset collected a priori. When the dataset does not have uniform coverage over all policy pairs, finding an approximate NE involves challenges in three aspects: (i) distributional shift between the behavior policy and the optimal policy, (ii) function approximation to handle large state space, and (iii) minimax optimization for equilibrium solving. We propose a pessimism-based algorithm, dubbed as pessimistic minimax value iteration (PMVI), which overcomes the distributional shift by constructing pessimistic estimates of the value functions for both players and outputs a policy pair by solving a correlated coarse equilibrium based on the two value functions. Furthermore, we establish a data-dependent upper bound on the suboptimality which recovers a sublinear rate without the assumption on uniform coverage of the dataset. We also prove an information-theoretical lower bound, which shows our upper bound is nearly minimax optimal, which suggests that the data-dependent term is intrinsic. Our theoretical results also highlight a notion of “relative uncertainty”, which characterizes the necessary and sufficient condition for achieving sample efficiency in offline MGs. To the best of our knowledge, we provide the first nearly minimax optimal result for offline MGs with function approximation.' volume: 162 URL: https://proceedings.mlr.press/v162/zhong22b.html PDF: https://proceedings.mlr.press/v162/zhong22b/zhong22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhong22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Han family: Zhong - given: Wei family: Xiong - given: Jiyuan family: Tan - given: Liwei family: Wang - given: Tong family: Zhang - given: Zhaoran family: Wang - given: Zhuoran family: Yang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27117-27142 id: zhong22b issued: date-parts: - 2022 - 6 - 28 firstpage: 27117 lastpage: 27142 published: 2022-06-28 00:00:00 +0000 - title: 'Dimension-free Complexity Bounds for High-order Nonconvex Finite-sum Optimization' abstract: 'Stochastic high-order methods for finding first-order stationary points in nonconvex finite-sum optimization have witnessed increasing interest in recent years, and various upper and lower bounds of the oracle complexity have been proved. However, under standard regularity assumptions, existing complexity bounds are all dimension-dependent (e.g., polylogarithmic dependence), which contrasts with the dimension-free complexity bounds for stochastic first-order methods and deterministic high-order methods. In this paper, we show that the polylogarithmic dimension dependence gap is not essential and can be closed. More specifically, we propose stochastic high-order algorithms with novel first-order and high-order derivative estimators, which can achieve dimension-free complexity bounds. With the access to $p$-th order derivatives of the objective function, we prove that our algorithm finds $\epsilon$-stationary points with $O(n^{(2p-1)/(2p)}/\epsilon^{(p+1)/p})$ high-order oracle complexities, where $n$ is the number of individual functions. Our result strictly improves the complexity bounds of existing high-order deterministic methods with respect to the dependence on $n$, and it is dimension-free compared with existing stochastic high-order methods.' volume: 162 URL: https://proceedings.mlr.press/v162/zhou22a.html PDF: https://proceedings.mlr.press/v162/zhou22a/zhou22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhou22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Dongruo family: Zhou - given: Quanquan family: Gu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27143-27158 id: zhou22a issued: date-parts: - 2022 - 6 - 28 firstpage: 27143 lastpage: 27158 published: 2022-06-28 00:00:00 +0000 - title: 'A Hierarchical Bayesian Approach to Inverse Reinforcement Learning with Symbolic Reward Machines' abstract: 'A misspecified reward can degrade sample efficiency and induce undesired behaviors in reinforcement learning (RL) problems. We propose symbolic reward machines for incorporating high-level task knowledge when specifying the reward signals. Symbolic reward machines augment existing reward machine formalism by allowing transitions to carry predicates and symbolic reward outputs. This formalism lends itself well to inverse reinforcement learning, whereby the key challenge is determining appropriate assignments to the symbolic values from a few expert demonstrations. We propose a hierarchical Bayesian approach for inferring the most likely assignments such that the concretized reward machine can discriminate expert demonstrated trajectories from other trajectories with high accuracy. Experimental results show that learned reward machines can significantly improve training efficiency for complex RL tasks and generalize well across different task environment configurations.' volume: 162 URL: https://proceedings.mlr.press/v162/zhou22b.html PDF: https://proceedings.mlr.press/v162/zhou22b/zhou22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhou22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Weichao family: Zhou - given: Wenchao family: Li editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27159-27178 id: zhou22b issued: date-parts: - 2022 - 6 - 28 firstpage: 27159 lastpage: 27178 published: 2022-06-28 00:00:00 +0000 - title: 'On the Optimization Landscape of Neural Collapse under MSE Loss: Global Optimality with Unconstrained Features' abstract: 'When training deep neural networks for classification tasks, an intriguing empirical phenomenon has been widely observed in the last-layer classifiers and features, where (i) the class means and the last-layer classifiers all collapse to the vertices of a Simplex Equiangular Tight Frame (ETF) up to scaling, and (ii) cross-example within-class variability of last-layer activations collapses to zero. This phenomenon is called Neural Collapse (NC), which seems to take place regardless of the choice of loss functions. In this work, we justify NC under the mean squared error (MSE) loss, where recent empirical evidence shows that it performs comparably or even better than the de-facto cross-entropy loss. Under a simplified unconstrained feature model, we provide the first global landscape analysis for vanilla nonconvex MSE loss and show that the (only!) global minimizers are neural collapse solutions, while all other critical points are strict saddles whose Hessian exhibit negative curvature directions. Furthermore, we justify the usage of rescaled MSE loss by probing the optimization landscape around the NC solutions, showing that the landscape can be improved by tuning the rescaling hyperparameters. Finally, our theoretical findings are experimentally verified on practical network architectures.' volume: 162 URL: https://proceedings.mlr.press/v162/zhou22c.html PDF: https://proceedings.mlr.press/v162/zhou22c/zhou22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhou22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jinxin family: Zhou - given: Xiao family: Li - given: Tianyu family: Ding - given: Chong family: You - given: Qing family: Qu - given: Zhihui family: Zhu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27179-27202 id: zhou22c issued: date-parts: - 2022 - 6 - 28 firstpage: 27179 lastpage: 27202 published: 2022-06-28 00:00:00 +0000 - title: 'Model Agnostic Sample Reweighting for Out-of-Distribution Learning' abstract: 'Distributionally robust optimization (DRO) and invariant risk minimization (IRM) are two popular methods proposed to improve out-of-distribution (OOD) generalization performance of machine learning models. While effective for small models, it has been observed that these methods can be vulnerable to overfitting with large overparameterized models. This work proposes a principled method, Model Agnostic samPLe rEweighting (MAPLE), to effectively address OOD problem, especially in overparameterized scenarios. Our key idea is to find an effective reweighting of the training samples so that the standard empirical risk minimization training of a large model on the weighted training data leads to superior OOD generalization performance. The overfitting issue is addressed by considering a bilevel formulation to search for the sample reweighting, in which the generalization complexity depends on the search space of sample weights instead of the model size. We present theoretical analysis in linear case to prove the insensitivity of MAPLE to model size, and empirically verify its superiority in surpassing state-of-the-art methods by a large margin.' volume: 162 URL: https://proceedings.mlr.press/v162/zhou22d.html PDF: https://proceedings.mlr.press/v162/zhou22d/zhou22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhou22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xiao family: Zhou - given: Yong family: Lin - given: Renjie family: Pi - given: Weizhong family: Zhang - given: Renzhe family: Xu - given: Peng family: Cui - given: Tong family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27203-27221 id: zhou22d issued: date-parts: - 2022 - 6 - 28 firstpage: 27203 lastpage: 27221 published: 2022-06-28 00:00:00 +0000 - title: 'Sparse Invariant Risk Minimization' abstract: 'Invariant Risk Minimization (IRM) is an emerging invariant feature extracting technique to help generalization with distributional shift. However, we find that there exists a basic and intractable contradiction between the model trainability and generalization ability in IRM. On one hand, recent studies on deep learning theory indicate the importance of large-sized or even overparameterized neural networks to make the model easy to train. On the other hand, unlike empirical risk minimization that can be benefited from overparameterization, our empirical and theoretical analyses show that the generalization ability of IRM is much easier to be demolished by overfitting caused by overparameterization. In this paper, we propose a simple yet effective paradigm named Sparse Invariant Risk Minimization (SparseIRM) to address this contradiction. Our key idea is to employ a global sparsity constraint as a defense to prevent spurious features from leaking in during the whole IRM process. Compared with sparisfy-after-training prototype by prior work which can discard invariant features, the global sparsity constraint limits the budget for feature selection and enforces SparseIRM to select the invariant features. We illustrate the benefit of SparseIRM through a theoretical analysis on a simple linear case. Empirically we demonstrate the power of SparseIRM through various datasets and models and surpass state-of-the-art methods with a gap up to 29%.' volume: 162 URL: https://proceedings.mlr.press/v162/zhou22e.html PDF: https://proceedings.mlr.press/v162/zhou22e/zhou22e.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhou22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xiao family: Zhou - given: Yong family: Lin - given: Weizhong family: Zhang - given: Tong family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27222-27244 id: zhou22e issued: date-parts: - 2022 - 6 - 28 firstpage: 27222 lastpage: 27244 published: 2022-06-28 00:00:00 +0000 - title: 'Prototype-Anchored Learning for Learning with Imperfect Annotations' abstract: 'The success of deep neural networks greatly relies on the availability of large amounts of high-quality annotated data, which however are difficult or expensive to obtain. The resulting labels may be class imbalanced, noisy or human biased. It is challenging to learn unbiased classification models from imperfectly annotated datasets, on which we usually suffer from overfitting or underfitting. In this work, we thoroughly investigate the popular softmax loss and margin-based loss, and offer a feasible approach to tighten the generalization error bound by maximizing the minimal sample margin. We further derive the optimality condition for this purpose, which indicates how the class prototypes should be anchored. Motivated by theoretical analysis, we propose a simple yet effective method, namely prototype-anchored learning (PAL), which can be easily incorporated into various learning-based classification schemes to handle imperfect annotation. We verify the effectiveness of PAL on class-imbalanced learning and noise-tolerant learning by extensive experiments on synthetic and real-world datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/zhou22f.html PDF: https://proceedings.mlr.press/v162/zhou22f/zhou22f.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhou22f.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xiong family: Zhou - given: Xianming family: Liu - given: Deming family: Zhai - given: Junjun family: Jiang - given: Xin family: Gao - given: Xiangyang family: Ji editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27245-27267 id: zhou22f issued: date-parts: - 2022 - 6 - 28 firstpage: 27245 lastpage: 27267 published: 2022-06-28 00:00:00 +0000 - title: 'FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting' abstract: 'Long-term time series forecasting is challenging since prediction accuracy tends to decrease dramatically with the increasing horizon. Although Transformer-based methods have significantly improved state-of-the-art results for long-term forecasting, they are not only computationally expensive but more importantly, are unable to capture the global view of time series (e.g. overall trend). To address these problems, we propose to combine Transformer with the seasonal-trend decomposition method, in which the decomposition method captures the global profile of time series while Transformers capture more detailed structures. To further enhance the performance of Transformer for long-term prediction, we exploit the fact that most time series tend to have a sparse representation in a well-known basis such as Fourier transform, and develop a frequency enhanced Transformer. Besides being more effective, the proposed method, termed as Frequency Enhanced Decomposed Transformer (FEDformer), is more efficient than standard Transformer with a linear complexity to the sequence length. Our empirical studies with six benchmark datasets show that compared with state-of-the-art methods, Fedformer can reduce prediction error by 14.8% and 22.6% for multivariate and univariate time series, respectively. Code is publicly available at https://github.com/MAZiqing/FEDformer.' volume: 162 URL: https://proceedings.mlr.press/v162/zhou22g.html PDF: https://proceedings.mlr.press/v162/zhou22g/zhou22g.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhou22g.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tian family: Zhou - given: Ziqing family: Ma - given: Qingsong family: Wen - given: Xue family: Wang - given: Liang family: Sun - given: Rong family: Jin editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27268-27286 id: zhou22g issued: date-parts: - 2022 - 6 - 28 firstpage: 27268 lastpage: 27286 published: 2022-06-28 00:00:00 +0000 - title: 'Probabilistic Bilevel Coreset Selection' abstract: 'The goal of coreset selection in supervised learning is to produce a weighted subset of data, so that training only on the subset achieves similar performance as training on the entire dataset. Existing methods achieved promising results in resource-constrained scenarios such as continual learning and streaming. However, most of the existing algorithms are limited to traditional machine learning models. A few algorithms that can handle large models adopt greedy search approaches due to the difficulty in solving the discrete subset selection problem, which is computationally costly when coreset becomes larger and often produces suboptimal results. In this work, for the first time we propose a continuous probabilistic bilevel formulation of coreset selection by learning a probablistic weight for each training sample. The overall objective is posed as a bilevel optimization problem, where 1) the inner loop samples coresets and train the model to convergence and 2) the outer loop updates the sample probability progressively according to the model’s performance. Importantly, we develop an efficient solver to the bilevel optimization problem via unbiased policy gradient without trouble of implicit differentiation. We theoretically prove the convergence of this training procedure and demonstrate the superiority of our algorithm against various coreset selection methods in various tasks, especially in more challenging label-noise and class-imbalance scenarios.' volume: 162 URL: https://proceedings.mlr.press/v162/zhou22h.html PDF: https://proceedings.mlr.press/v162/zhou22h/zhou22h.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhou22h.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xiao family: Zhou - given: Renjie family: Pi - given: Weizhong family: Zhang - given: Yong family: Lin - given: Zonghao family: Chen - given: Tong family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27287-27302 id: zhou22h issued: date-parts: - 2022 - 6 - 28 firstpage: 27287 lastpage: 27302 published: 2022-06-28 00:00:00 +0000 - title: 'Approximate Frank-Wolfe Algorithms over Graph-structured Support Sets' abstract: 'In this paper, we consider approximate Frank-Wolfe (FW) algorithms to solve convex optimization problems over graph-structured support sets where the linear minimization oracle (LMO) cannot be efficiently obtained in general. We first demonstrate that two popular approximation assumptions (additive and multiplicative gap errors) are not applicable in that no cheap gap-approximate LMO oracle exists. Thus, approximate dual maximization oracles (DMO) are proposed, which approximate the inner product rather than the gap. We prove that the standard FW method using a $\delta$-approximate DMO converges as $O((1-\delta) \sqrt{s}/\delta)$ in the worst case, and as $O(L/(\delta^2 t))$ over a $\delta$-relaxation of the constraint set. Furthermore, when the solution is on the boundary, a variant of FW converges as $O(1/t^2)$ under the quadratic growth assumption. Our empirical results suggest that even these improved bounds are pessimistic, showing fast convergence in recovering real-world images with graph-structured sparsity.' volume: 162 URL: https://proceedings.mlr.press/v162/zhou22i.html PDF: https://proceedings.mlr.press/v162/zhou22i/zhou22i.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhou22i.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Baojian family: Zhou - given: Yifan family: Sun editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27303-27337 id: zhou22i issued: date-parts: - 2022 - 6 - 28 firstpage: 27303 lastpage: 27337 published: 2022-06-28 00:00:00 +0000 - title: 'Improving Adversarial Robustness via Mutual Information Estimation' abstract: 'Deep neural networks (DNNs) are found to be vulnerable to adversarial noise. They are typically misled by adversarial samples to make wrong predictions. To alleviate this negative effect, in this paper, we investigate the dependence between outputs of the target model and input adversarial samples from the perspective of information theory, and propose an adversarial defense method. Specifically, we first measure the dependence by estimating the mutual information (MI) between outputs and the natural patterns of inputs (called natural MI) and MI between outputs and the adversarial patterns of inputs (called adversarial MI), respectively. We find that adversarial samples usually have larger adversarial MI and smaller natural MI compared with those w.r.t. natural samples. Motivated by this observation, we propose to enhance the adversarial robustness by maximizing the natural MI and minimizing the adversarial MI during the training process. In this way, the target model is expected to pay more attention to the natural pattern that contains objective semantics. Empirical evaluations demonstrate that our method could effectively improve the adversarial accuracy against multiple attacks.' volume: 162 URL: https://proceedings.mlr.press/v162/zhou22j.html PDF: https://proceedings.mlr.press/v162/zhou22j/zhou22j.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhou22j.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Dawei family: Zhou - given: Nannan family: Wang - given: Xinbo family: Gao - given: Bo family: Han - given: Xiaoyu family: Wang - given: Yibing family: Zhan - given: Tongliang family: Liu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27338-27352 id: zhou22j issued: date-parts: - 2022 - 6 - 28 firstpage: 27338 lastpage: 27352 published: 2022-06-28 00:00:00 +0000 - title: 'Modeling Adversarial Noise for Adversarial Training' abstract: 'Deep neural networks have been demonstrated to be vulnerable to adversarial noise, promoting the development of defense against adversarial attacks. Motivated by the fact that adversarial noise contains well-generalizing features and that the relationship between adversarial data and natural data can help infer natural data and make reliable predictions, in this paper, we study to model adversarial noise by learning the transition relationship between adversarial labels (i.e. the flipped labels used to generate adversarial data) and natural labels (i.e. the ground truth labels of the natural data). Specifically, we introduce an instance-dependent transition matrix to relate adversarial labels and natural labels, which can be seamlessly embedded with the target model (enabling us to model stronger adaptive adversarial noise). Empirical evaluations demonstrate that our method could effectively improve adversarial accuracy.' volume: 162 URL: https://proceedings.mlr.press/v162/zhou22k.html PDF: https://proceedings.mlr.press/v162/zhou22k/zhou22k.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhou22k.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Dawei family: Zhou - given: Nannan family: Wang - given: Bo family: Han - given: Tongliang family: Liu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27353-27366 id: zhou22k issued: date-parts: - 2022 - 6 - 28 firstpage: 27353 lastpage: 27366 published: 2022-06-28 00:00:00 +0000 - title: 'Contrastive Learning with Boosted Memorization' abstract: 'Self-supervised learning has achieved a great success in the representation learning of visual and textual data. However, the current methods are mainly validated on the well-curated datasets, which do not exhibit the real-world long-tailed distribution. Recent attempts to consider self-supervised long-tailed learning are made by rebalancing in the loss perspective or the model perspective, resembling the paradigms in the supervised long-tailed learning. Nevertheless, without the aid of labels, these explorations have not shown the expected significant promise due to the limitation in tail sample discovery or the heuristic structure design. Different from previous works, we explore this direction from an alternative perspective, i.e., the data perspective, and propose a novel Boosted Contrastive Learning (BCL) method. Specifically, BCL leverages the memorization effect of deep neural networks to automatically drive the information discrepancy of the sample views in contrastive learning, which is more efficient to enhance the long-tailed learning in the label-unaware context. Extensive experiments on a range of benchmark datasets demonstrate the effectiveness of BCL over several state-of-the-art methods. Our code is available at https://github.com/MediaBrain-SJTU/BCL.' volume: 162 URL: https://proceedings.mlr.press/v162/zhou22l.html PDF: https://proceedings.mlr.press/v162/zhou22l/zhou22l.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhou22l.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhihan family: Zhou - given: Jiangchao family: Yao - given: Yan-Feng family: Wang - given: Bo family: Han - given: Ya family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27367-27377 id: zhou22l issued: date-parts: - 2022 - 6 - 28 firstpage: 27367 lastpage: 27377 published: 2022-06-28 00:00:00 +0000 - title: 'Understanding The Robustness in Vision Transformers' abstract: 'Recent studies show that Vision Transformers (ViTs) exhibit strong robustness against various corruptions. Although this property is partly attributed to the self-attention mechanism, there is still a lack of an explanatory framework towards a more systematic understanding. In this paper, we examine the role of self-attention in learning robust representations. Our study is motivated by the intriguing properties of self-attention in visual grouping which indicate that self-attention could promote improved mid-level representation and robustness. We thus propose a family of fully attentional networks (FANs) that incorporate self-attention in both token mixing and channel processing. We validate the design comprehensively on various hierarchical backbones. Our model with a DeiT architecture achieves a state-of-the-art 47.6% mCE on ImageNet-C with 29M parameters. We also demonstrate significantly improved robustness in two downstream tasks: semantic segmentation and object detection' volume: 162 URL: https://proceedings.mlr.press/v162/zhou22m.html PDF: https://proceedings.mlr.press/v162/zhou22m/zhou22m.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhou22m.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Daquan family: Zhou - given: Zhiding family: Yu - given: Enze family: Xie - given: Chaowei family: Xiao - given: Animashree family: Anandkumar - given: Jiashi family: Feng - given: Jose M. family: Alvarez editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27378-27394 id: zhou22m issued: date-parts: - 2022 - 6 - 28 firstpage: 27378 lastpage: 27394 published: 2022-06-28 00:00:00 +0000 - title: 'VLUE: A Multi-Task Multi-Dimension Benchmark for Evaluating Vision-Language Pre-training' abstract: 'Recent advances in vision-language pre-training (VLP) have demonstrated impressive performance in a range of vision-language (VL) tasks. However, there exist several challenges for measuring the community’s progress in building general multi-modal intelligence. First, most of the downstream VL datasets are annotated using raw images that are already seen during pre-training, which may result in an overestimation of current VLP models’ generalization ability. Second, recent VLP work mainly focuses on absolute performance but overlooks the efficiency-performance trade-off, which is also an important indicator for measuring progress. To this end, we introduce the Vision-Language Understanding Evaluation (VLUE) benchmark, a multi-task multi-dimension benchmark for evaluating the generalization capabilities and the efficiency-performance trade-off (“Pareto SOTA”) of VLP models. We demonstrate that there is a sizable generalization gap for all VLP models when testing on out-of-distribution test sets annotated on images from a more diverse distribution that spreads across cultures. Moreover, we find that measuring the efficiency-performance trade-off of VLP models leads to complementary insights for several design choices of VLP. We release the VLUE benchmark to promote research on building vision-language models that generalize well to images unseen during pre-training and are practical in terms of efficiency-performance trade-off.' volume: 162 URL: https://proceedings.mlr.press/v162/zhou22n.html PDF: https://proceedings.mlr.press/v162/zhou22n/zhou22n.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhou22n.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Wangchunshu family: Zhou - given: Yan family: Zeng - given: Shizhe family: Diao - given: Xinsong family: Zhang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27395-27411 id: zhou22n issued: date-parts: - 2022 - 6 - 28 firstpage: 27395 lastpage: 27411 published: 2022-06-28 00:00:00 +0000 - title: 'Detecting Corrupted Labels Without Training a Model to Predict' abstract: 'Label noise in real-world datasets encodes wrong correlation patterns and impairs the generalization of deep neural networks (DNNs). It is critical to find efficient ways to detect corrupted patterns. Current methods primarily focus on designing robust training techniques to prevent DNNs from memorizing corrupted patterns. These approaches often require customized training processes and may overfit corrupted patterns, leading to a performance drop in detection. In this paper, from a more data-centric perspective, we propose a training-free solution to detect corrupted labels. Intuitively, “closer” instances are more likely to share the same clean label. Based on the neighborhood information, we propose two methods: the first one uses “local voting" via checking the noisy label consensuses of nearby features. The second one is a ranking-based approach that scores each instance and filters out a guaranteed number of instances that are likely to be corrupted. We theoretically analyze how the quality of features affects the local voting and provide guidelines for tuning neighborhood size. We also prove the worst-case error bound for the ranking-based method. Experiments with both synthetic and real-world label noise demonstrate our training-free solutions consistently and significantly improve most of the training-based baselines. Code is available at github.com/UCSC-REAL/SimiFeat.' volume: 162 URL: https://proceedings.mlr.press/v162/zhu22a.html PDF: https://proceedings.mlr.press/v162/zhu22a/zhu22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhaowei family: Zhu - given: Zihao family: Dong - given: Yang family: Liu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27412-27427 id: zhu22a issued: date-parts: - 2022 - 6 - 28 firstpage: 27412 lastpage: 27427 published: 2022-06-28 00:00:00 +0000 - title: 'Contextual Bandits with Large Action Spaces: Made Practical' abstract: 'A central problem in sequential decision making is to develop algorithms that are practical and computationally efficient, yet support the use of flexible, general-purpose models. Focusing on the contextual bandit problem, recent progress provides provably efficient algorithms with strong empirical performance when the number of possible alternatives (“actions”) is small, but guarantees for decision making in large, continuous action spaces have remained elusive, leading to a significant gap between theory and practice. We present the first efficient, general-purpose algorithm for contextual bandits with continuous, linearly structured action spaces. Our algorithm makes use of computational oracles for (i) supervised learning, and (ii) optimization over the action space, and achieves sample complexity, runtime, and memory independent of the size of the action space. In addition, it is simple and practical. We perform a large-scale empirical evaluation, and show that our approach typically enjoys superior performance and efficiency compared to standard baselines.' volume: 162 URL: https://proceedings.mlr.press/v162/zhu22b.html PDF: https://proceedings.mlr.press/v162/zhu22b/zhu22b.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhu22b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yinglun family: Zhu - given: Dylan J family: Foster - given: John family: Langford - given: Paul family: Mineiro editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27428-27453 id: zhu22b issued: date-parts: - 2022 - 6 - 28 firstpage: 27428 lastpage: 27453 published: 2022-06-28 00:00:00 +0000 - title: 'Neural-Symbolic Models for Logical Queries on Knowledge Graphs' abstract: 'Answering complex first-order logic (FOL) queries on knowledge graphs is a fundamental task for multi-hop reasoning. Traditional symbolic methods traverse a complete knowledge graph to extract the answers, which provides good interpretation for each step. Recent neural methods learn geometric embeddings for complex queries. These methods can generalize to incomplete knowledge graphs, but their reasoning process is hard to interpret. In this paper, we propose Graph Neural Network Query Executor (GNN-QE), a neural-symbolic model that enjoys the advantages of both worlds. GNN-QE decomposes a complex FOL query into relation projections and logical operations over fuzzy sets, which provides interpretability for intermediate variables. To reason about the missing links, GNN-QE adapts a graph neural network from knowledge graph completion to execute the relation projections, and models the logical operations with product fuzzy logic. Experiments on 3 datasets show that GNN-QE significantly improves over previous state-of-the-art models in answering FOL queries. Meanwhile, GNN-QE can predict the number of answers without explicit supervision, and provide visualizations for intermediate variables.' volume: 162 URL: https://proceedings.mlr.press/v162/zhu22c.html PDF: https://proceedings.mlr.press/v162/zhu22c/zhu22c.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhu22c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhaocheng family: Zhu - given: Mikhail family: Galkin - given: Zuobai family: Zhang - given: Jian family: Tang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27454-27478 id: zhu22c issued: date-parts: - 2022 - 6 - 28 firstpage: 27454 lastpage: 27478 published: 2022-06-28 00:00:00 +0000 - title: 'Topology-aware Generalization of Decentralized SGD' abstract: 'This paper studies the algorithmic stability and generalizability of decentralized stochastic gradient descent (D-SGD). We prove that the consensus model learned by D-SGD is $\mathcal{O}{(m/N\unaryplus1/m\unaryplus\lambda^2)}$-stable in expectation in the non-convex non-smooth setting, where $N$ is the total sample size of the whole system, $m$ is the worker number, and $1\unaryminus\lambda$ is the spectral gap that measures the connectivity of the communication topology. These results then deliver an $\mathcal{O}{(1/N\unaryplus{({(m^{-1}\lambda^2)}^{\frac{\alpha}{2}}\unaryplus m^{\unaryminus\alpha})}/{N^{1\unaryminus\frac{\alpha}{2}}})}$ in-average generalization bound, which is non-vacuous even when $\lambda$ is closed to $1$, in contrast to vacuous as suggested by existing literature on the projected version of D-SGD. Our theory indicates that the generalizability of D-SGD has a positive correlation with the spectral gap, and can explain why consensus control in initial training phase can ensure better generalization. Experiments of VGG-11 and ResNet-18 on CIFAR-10, CIFAR-100 and Tiny-ImageNet justify our theory. To our best knowledge, this is the first work on the topology-aware generalization of vanilla D-SGD. Code is available at \url{https://github.com/Raiden-Zhu/Generalization-of-DSGD}.' volume: 162 URL: https://proceedings.mlr.press/v162/zhu22d.html PDF: https://proceedings.mlr.press/v162/zhu22d/zhu22d.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhu22d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tongtian family: Zhu - given: Fengxiang family: He - given: Lan family: Zhang - given: Zhengyang family: Niu - given: Mingli family: Song - given: Dacheng family: Tao editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27479-27503 id: zhu22d issued: date-parts: - 2022 - 6 - 28 firstpage: 27479 lastpage: 27503 published: 2022-06-28 00:00:00 +0000 - title: 'Resilient and Communication Efficient Learning for Heterogeneous Federated Systems' abstract: 'The rise of Federated Learning (FL) is bringing machine learning to edge computing by utilizing data scattered across edge devices. However, the heterogeneity of edge network topologies and the uncertainty of wireless transmission are two major obstructions of FL’s wide application in edge computing, leading to prohibitive convergence time and high communication cost. In this work, we propose an FL scheme to address both challenges simultaneously. Specifically, we enable edge devices to learn self-distilled neural networks that are readily prunable to arbitrary sizes, which capture the knowledge of the learning domain in a nested and progressive manner. Not only does our approach tackle system heterogeneity by serving edge devices with varying model architectures, but it also alleviates the issue of connection uncertainty by allowing transmitting part of the model parameters under faulty network connections, without wasting the contributing knowledge of the transmitted parameters. Extensive empirical studies show that under system heterogeneity and network instability, our approach demonstrates significant resilience and higher communication efficiency compared to the state-of-the-art.' volume: 162 URL: https://proceedings.mlr.press/v162/zhu22e.html PDF: https://proceedings.mlr.press/v162/zhu22e/zhu22e.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhu22e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhuangdi family: Zhu - given: Junyuan family: Hong - given: Steve family: Drew - given: Jiayu family: Zhou editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27504-27526 id: zhu22e issued: date-parts: - 2022 - 6 - 28 firstpage: 27504 lastpage: 27526 published: 2022-06-28 00:00:00 +0000 - title: 'On Numerical Integration in Neural Ordinary Differential Equations' abstract: 'The combination of ordinary differential equations and neural networks, i.e., neural ordinary differential equations (Neural ODE), has been widely studied from various angles. However, deciphering the numerical integration in Neural ODE is still an open challenge, as many researches demonstrated that numerical integration significantly affects the performance of the model. In this paper, we propose the inverse modified differential equations (IMDE) to clarify the influence of numerical integration on training Neural ODE models. IMDE is determined by the learning task and the employed ODE solver. It is shown that training a Neural ODE model actually returns a close approximation of the IMDE, rather than the true ODE. With the help of IMDE, we deduce that (i) the discrepancy between the learned model and the true ODE is bounded by the sum of discretization error and learning loss; (ii) Neural ODE using non-symplectic numerical integration fail to learn conservation laws theoretically. Several experiments are performed to numerically verify our theoretical analysis.' volume: 162 URL: https://proceedings.mlr.press/v162/zhu22f.html PDF: https://proceedings.mlr.press/v162/zhu22f/zhu22f.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhu22f.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Aiqing family: Zhu - given: Pengzhan family: Jin - given: Beibei family: Zhu - given: Yifa family: Tang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27527-27547 id: zhu22f issued: date-parts: - 2022 - 6 - 28 firstpage: 27527 lastpage: 27547 published: 2022-06-28 00:00:00 +0000 - title: 'When AUC meets DRO: Optimizing Partial AUC for Deep Learning with Non-Convex Convergence Guarantee' abstract: 'In this paper, we propose systematic and efficient gradient-based methods for both one-way and two-way partial AUC (pAUC) maximization that are applicable to deep learning. We propose new formulations of pAUC surrogate objectives by using the distributionally robust optimization (DRO) to define the loss for each individual positive data. We consider two formulations of DRO, one of which is based on conditional-value-at-risk (CVaR) that yields a non-smooth but exact estimator for pAUC, and another one is based on a KL divergence regularized DRO that yields an inexact but smooth (soft) estimator for pAUC. For both one-way and two-way pAUC maximization, we propose two algorithms and prove their convergence for optimizing their two formulations, respectively. Experiments demonstrate the effectiveness of the proposed algorithms for pAUC maximization for deep learning on various datasets.' volume: 162 URL: https://proceedings.mlr.press/v162/zhu22g.html PDF: https://proceedings.mlr.press/v162/zhu22g/zhu22g.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhu22g.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Dixian family: Zhu - given: Gang family: Li - given: Bokun family: Wang - given: Xiaodong family: Wu - given: Tianbao family: Yang editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27548-27573 id: zhu22g issued: date-parts: - 2022 - 6 - 28 firstpage: 27548 lastpage: 27573 published: 2022-06-28 00:00:00 +0000 - title: 'Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces' abstract: 'Designing efficient general-purpose contextual bandit algorithms that work with large—or even infinite—action spaces would facilitate application to important scenarios such as information retrieval, recommendation systems, and continuous control. While obtaining standard regret guarantees can be hopeless, alternative regret notions have been proposed to tackle the large action setting. We propose a smooth regret notion for contextual bandits, which dominates previously proposed alternatives. We design a statistically and computationally efficient algorithm—for the proposed smooth regret—that works with general function approximation under standard supervised oracles. We also present an adaptive algorithm that automatically adapts to any smoothness level. Our algorithms can be used to recover the previous minimax/Pareto optimal guarantees under the standard regret, e.g., in bandit problems with multiple best arms and Lipschitz/H{ö}lder bandits. We conduct large-scale empirical evaluations demonstrating the efficacy of our proposed algorithms.' volume: 162 URL: https://proceedings.mlr.press/v162/zhu22h.html PDF: https://proceedings.mlr.press/v162/zhu22h/zhu22h.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhu22h.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Yinglun family: Zhu - given: Paul family: Mineiro editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27574-27590 id: zhu22h issued: date-parts: - 2022 - 6 - 28 firstpage: 27574 lastpage: 27590 published: 2022-06-28 00:00:00 +0000 - title: 'Residual-Based Sampling for Online Outlier-Robust PCA' abstract: 'Outlier-robust principal component analysis (ORPCA) has been broadly applied in scientific discovery in the last decades. In this paper, we study online ORPCA, an important variant that addresses the practical challenge that the data points arrive in a sequential manner and the goal is to recover the underlying subspace of the clean data with one pass of the data. Our main contribution is the first provable algorithm that enjoys comparable recovery guarantee to the best known batch algorithm, while significantly improving upon the state-of-the-art online ORPCA algorithms. The core technique is a robust version of the residual norm which, informally speaking, leverages not only the importance of a data point, but also how likely it behaves as an outlier.' volume: 162 URL: https://proceedings.mlr.press/v162/zhu22i.html PDF: https://proceedings.mlr.press/v162/zhu22i/zhu22i.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhu22i.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Tianhao family: Zhu - given: Jie family: Shen editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27591-27611 id: zhu22i issued: date-parts: - 2022 - 6 - 28 firstpage: 27591 lastpage: 27611 published: 2022-06-28 00:00:00 +0000 - title: 'Region-Based Semantic Factorization in GANs' abstract: 'Despite the rapid advancement of semantic discovery in the latent space of Generative Adversarial Networks (GANs), existing approaches either are limited to finding global attributes or rely on a number of segmentation masks to identify local attributes. In this work, we present a highly efficient algorithm to factorize the latent semantics learned by GANs concerning an arbitrary image region. Concretely, we revisit the task of local manipulation with pre-trained GANs and formulate region-based semantic discovery as a dual optimization problem. Through an appropriately defined generalized Rayleigh quotient, we manage to solve such a problem without any annotations or training. Experimental results on various state-of-the-art GAN models demonstrate the effectiveness of our approach, as well as its superiority over prior arts regarding precise control, region robustness, speed of implementation, and simplicity of use.' volume: 162 URL: https://proceedings.mlr.press/v162/zhu22j.html PDF: https://proceedings.mlr.press/v162/zhu22j/zhu22j.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhu22j.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Jiapeng family: Zhu - given: Yujun family: Shen - given: Yinghao family: Xu - given: Deli family: Zhao - given: Qifeng family: Chen editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27612-27632 id: zhu22j issued: date-parts: - 2022 - 6 - 28 firstpage: 27612 lastpage: 27632 published: 2022-06-28 00:00:00 +0000 - title: 'Beyond Images: Label Noise Transition Matrix Estimation for Tasks with Lower-Quality Features' abstract: 'The label noise transition matrix, denoting the transition probabilities from clean labels to noisy labels, is crucial for designing statistically robust solutions. Existing estimators for noise transition matrices, e.g., using either anchor points or clusterability, focus on computer vision tasks that are relatively easier to obtain high-quality representations. We observe that tasks with lower-quality features fail to meet the anchor-point or clusterability condition, due to the coexistence of both uninformative and informative representations. To handle this issue, we propose a generic and practical information-theoretic approach to down-weight the less informative parts of the lower-quality features. This improvement is crucial to identifying and estimating the label noise transition matrix. The salient technical challenge is to compute the relevant information-theoretical metrics using only noisy labels instead of clean ones. We prove that the celebrated $f$-mutual information measure can often preserve the order when calculated using noisy labels. We then build our transition matrix estimator using this distilled version of features. The necessity and effectiveness of the proposed method are also demonstrated by evaluating the estimation error on a varied set of tabular data and text classification tasks with lower-quality features. Code is available at github.com/UCSC-REAL/BeyondImages.' volume: 162 URL: https://proceedings.mlr.press/v162/zhu22k.html PDF: https://proceedings.mlr.press/v162/zhu22k/zhu22k.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zhu22k.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Zhaowei family: Zhu - given: Jialu family: Wang - given: Yang family: Liu editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27633-27653 id: zhu22k issued: date-parts: - 2022 - 6 - 28 firstpage: 27633 lastpage: 27653 published: 2022-06-28 00:00:00 +0000 - title: 'Towards Uniformly Superhuman Autonomy via Subdominance Minimization' abstract: 'Prevalent imitation learning methods seek to produce behavior that matches or exceeds average human performance. This often prevents achieving expert-level or superhuman performance when identifying the better demonstrations to imitate is difficult. We instead assume demonstrations are of varying quality and seek to induce behavior that is unambiguously better (i.e., Pareto dominant or minimally subdominant) than all human demonstrations. Our minimum subdominance inverse optimal control training objective is primarily defined by high quality demonstrations; lower quality demonstrations, which are more easily dominated, are effectively ignored instead of degrading imitation. With increasing probability, our approach produces superhuman behavior incurring lower cost than demonstrations on the demonstrator’s unknown cost function{—}even if that cost function differs for each demonstration. We apply our approach on a computer cursor pointing task, producing behavior that is 78% superhuman, while minimizing demonstration suboptimality provides 50% superhuman behavior{—}and only 72% even after selective data cleaning.' volume: 162 URL: https://proceedings.mlr.press/v162/ziebart22a.html PDF: https://proceedings.mlr.press/v162/ziebart22a/ziebart22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-ziebart22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Brian family: Ziebart - given: Sanjiban family: Choudhury - given: Xinyan family: Yan - given: Paul family: Vernaza editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27654-27670 id: ziebart22a issued: date-parts: - 2022 - 6 - 28 firstpage: 27654 lastpage: 27670 published: 2022-06-28 00:00:00 +0000 - title: 'Inductive Matrix Completion: No Bad Local Minima and a Fast Algorithm' abstract: 'The inductive matrix completion (IMC) problem is to recover a low rank matrix from few observed entries while incorporating prior knowledge about its row and column subspaces. In this work, we make three contributions to the IMC problem: (i) we prove that under suitable conditions, the IMC optimization landscape has no bad local minima; (ii) we derive a simple scheme with theoretical guarantees to estimate the rank of the unknown matrix; and (iii) we propose GNIMC, a simple Gauss-Newton based method to solve the IMC problem, analyze its runtime and derive for it strong recovery guarantees. The guarantees for GNIMC are sharper in several aspects than those available for other methods, including a quadratic convergence rate, fewer required observed entries and stability to errors or deviations from low-rank. Empirically, given entries observed uniformly at random, GNIMC recovers the underlying matrix substantially faster than several competing methods.' volume: 162 URL: https://proceedings.mlr.press/v162/zilber22a.html PDF: https://proceedings.mlr.press/v162/zilber22a/zilber22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zilber22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Pini family: Zilber - given: Boaz family: Nadler editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27671-27692 id: zilber22a issued: date-parts: - 2022 - 6 - 28 firstpage: 27671 lastpage: 27692 published: 2022-06-28 00:00:00 +0000 - title: 'Counterfactual Prediction for Outcome-Oriented Treatments' abstract: 'Large amounts of efforts have been devoted into learning counterfactual treatment outcome under various settings, including binary/continuous/multiple treatments. Most of these literature aims to minimize the estimation error of counterfactual outcome for the whole treatment space. However, in most scenarios when the counterfactual prediction model is utilized to assist decision-making, people are only concerned with the small fraction of treatments that can potentially induce superior outcome (i.e. outcome-oriented treatments). This gap of objective is even more severe when the number of possible treatments is large, for example under the continuous treatment setting. To overcome it, we establish a new objective of optimizing counterfactual prediction on outcome-oriented treatments, propose a novel Outcome-Oriented Sample Re-weighting (OOSR) method to make the predictive model concentrate more on outcome-oriented treatments, and theoretically analyze that our method can improve treatment selection towards the optimal one. Extensive experimental results on both synthetic datasets and semi-synthetic datasets demonstrate the effectiveness of our method.' volume: 162 URL: https://proceedings.mlr.press/v162/zou22a.html PDF: https://proceedings.mlr.press/v162/zou22a/zou22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zou22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Hao family: Zou - given: Bo family: Li - given: Jiangang family: Han - given: Shuiping family: Chen - given: Xuetao family: Ding - given: Peng family: Cui editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27693-27706 id: zou22a issued: date-parts: - 2022 - 6 - 28 firstpage: 27693 lastpage: 27706 published: 2022-06-28 00:00:00 +0000 - title: 'SpaceMAP: Visualizing High-Dimensional Data by Space Expansion' abstract: 'Dimensionality reduction (DR) of high-dimensional data is of theoretical and practical interest in machine learning. However, there exist intriguing, non-intuitive discrepancies between the geometry of high- and low-dimensional space. We look into such discrepancies and propose a novel visualization method called Space-based Manifold Approximation and Projection (SpaceMAP). Our method establishes an analytical transformation on distance metrics between spaces to address the “crowding problem" in DR. With the proposed equivalent extended distance (EED), we are able to match the capacity of high- and low-dimensional space in a principled manner. To handle complex data with different manifold properties, we propose hierarchical manifold approximation to model the similarity function in a data-specific manner. We evaluated SpaceMAP on a range of synthetic and real datasets with varying manifold properties, and demonstrated its excellent performance in comparison with classical and state-of-the-art DR methods. In particular, the concept of space expansion provides a generic framework for understanding nonlinear DR methods including the popular t-distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection' volume: 162 URL: https://proceedings.mlr.press/v162/zu22a.html PDF: https://proceedings.mlr.press/v162/zu22a/zu22a.pdf edit: https://github.com/mlresearch//v162/edit/gh-pages/_posts/2022-06-28-zu22a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the 39th International Conference on Machine Learning' publisher: 'PMLR' author: - given: Xinrui family: Zu - given: Qian family: Tao editor: - given: Kamalika family: Chaudhuri - given: Stefanie family: Jegelka - given: Le family: Song - given: Csaba family: Szepesvari - given: Gang family: Niu - given: Sivan family: Sabato page: 27707-27723 id: zu22a issued: date-parts: - 2022 - 6 - 28 firstpage: 27707 lastpage: 27723 published: 2022-06-28 00:00:00 +0000