- title: ' Cross-Loss Influence Functions to Explain Deep Network Representations '
  abstract: ' As machine learning is increasingly deployed in the real world, it is paramount that we develop the tools necessary to analyze the decision-making of the models we train and deploy to end-users. Recently, researchers have shown that influence functions, a statistical measure of sample impact, can approximate the effects of training samples on classification accuracy for deep neural networks. However, this prior work only applies to supervised learning, where training and testing share an objective function. No approaches currently exist for estimating the influence of unsupervised training examples for deep learning models. To bring explainability to unsupervised and semi-supervised training regimes, we derive the first theoretical and empirical demonstration that influence functions can be extended to handle mismatched training and testing (i.e., "cross-loss") settings. Our formulation enables us to compute the influence in an unsupervised learning setup, explain cluster memberships, and identify and augment biases in language models. Our experiments show that our cross-loss influence estimates even exceed matched-objective influence estimation relative to ground-truth sample impact. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/silva22a.html
  PDF: https://proceedings.mlr.press/v151/silva22a/silva22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-silva22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Andrew
    family: Silva
  - given: Rohit
    family: Chopra
  - given: Matthew
    family: Gombolay
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1-17
  id: silva22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1
  lastpage: 17
  published: 2022-05-03 00:00:00 +0000
- title: ' Federated Reinforcement Learning with Environment Heterogeneity '
  abstract: ' We study Federated Reinforcement Learning (FedRL) problem in which $n$ agents collaboratively learn a single policy without sharing the trajectories they collected during agent-environment interaction. In this paper, we stress the constraint of environment heterogeneity, which means $n$ environments corresponding to these $n$ agents have different state-transitions. To obtain a value function or a policy function which optimizes the overall performance in all environments, we propose two algorithms, we propose two federated RL algorithms, QAvg and PAvg. We theoretically prove that these algorithms converge to suboptimal solutions, while such suboptimality depends on how heterogeneous these $n$ environments are. Moreover, we propose a heuristic that achieves personalization by embedding the $n$ environments into $n$ vectors. The personalization heuristic not only improves the training but also allows for better generalization to new environments. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/jin22a.html
  PDF: https://proceedings.mlr.press/v151/jin22a/jin22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-jin22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Hao
    family: Jin
  - given: Yang
    family: Peng
  - given: Wenhao
    family: Yang
  - given: Shusen
    family: Wang
  - given: Zhihua
    family: Zhang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 18-37
  id: jin22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 18
  lastpage: 37
  published: 2022-05-03 00:00:00 +0000
- title: ' On Linear Model with Markov Signal Priors '
  abstract: ' In this paper, we estimate free energy, average mutual information, and minimum mean square error (MMSE) of a linear model under the assumption that the source is generated by a Markov chain. Our estimates are based on the replica method in statistical physics. We show that under the MMSE estimator, the linear model with Markov sources or hidden Markov sources is decoupled into single input AWGN channels with state information available at both encoder and decoder where the state distribution follows the stationary distribution of the stochastic matrix of Markov chains. Numerical results show that the free energies and MSEs obtained via the replica method are closely approximate to their counterparts via MCMC simulations. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/truong22a.html
  PDF: https://proceedings.mlr.press/v151/truong22a/truong22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-truong22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Lan V.
    family: Truong
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 38-53
  id: truong22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 38
  lastpage: 53
  published: 2022-05-03 00:00:00 +0000
- title: ' Maillard Sampling: Boltzmann Exploration Done Optimally '
  abstract: ' The PhD thesis of Maillard (2013) presents a rather obscure algorithm for the $K$-armed bandit problem. This less-known algorithm, which we call Maillard sampling (MS), computes the probability of choosing each arm in a closed form, which is not true for Thompson sampling, a widely-adopted bandit algorithm in the industry. This means that the bandit-logged data from running MS can be readily used for counterfactual evaluation, unlike Thompson sampling. Motivated by such merit, we revisit MS and perform an improved analysis to show that it achieves both the asymptotical optimality and $\sqrt{KT\log{T}}$ minimax regret bound where $T$ is the time horizon, which matches the known bounds for asymptotically optimal UCB. We then propose a variant of MS called MS$^+$ that improves its minimax bound to $\sqrt{KT\log{K}}$. MS$^+$ can also be tuned to be aggressive (i.e., less exploration) without losing the asymptotic optimality, a unique feature unavailable from existing bandit algorithms. Our numerical evaluation shows the effectiveness of MS$^+$. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/bian22a.html
  PDF: https://proceedings.mlr.press/v151/bian22a/bian22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-bian22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jie
    family: Bian
  - given: Kwang-Sung
    family: Jun
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 54-72
  id: bian22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 54
  lastpage: 72
  published: 2022-05-03 00:00:00 +0000
- title: ' Norm-Agnostic Linear Bandits '
  abstract: ' Linear bandits have a wide variety of applications including recommendation systems yet they make one strong assumption: the algorithms must know an upper bound $S$ on the norm of the unknown parameter $\theta^*$ that governs the reward generation. Such an assumption forces the practitioner to guess $S$ involved in the confidence bound, leaving no choice but to wish that $\|\theta^*\|\le S$ is true to guarantee that the regret will be low. In this paper, we propose novel algorithms that do not require such knowledge for the first time. Specifically, we propose two algorithms and analyze their regret bounds: one for the changing arm set setting and the other for the fixed arm set setting. Our regret bound for the former shows that the price of not knowing $S$ does not affect the leading term in the regret bound and inflates only the lower order term. For the latter, we do not pay any price in the regret for now knowing $S$. Our numerical experiments show standard algorithms assuming knowledge of $S$ can fail catastrophically when $\|\theta^*\|\le S$ is not true whereas our algorithms enjoy low regret. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/gales22a.html
  PDF: https://proceedings.mlr.press/v151/gales22a/gales22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-gales22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Spencer B.
    family: Gales
  - given: Sunder
    family: Sethuraman
  - given: Kwang-Sung
    family: Jun
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 73-91
  id: gales22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 73
  lastpage: 91
  published: 2022-05-03 00:00:00 +0000
- title: ' Gaussian Process Bandit Optimization with Few Batches '
  abstract: ' In this paper, we consider the problem of black-box optimization using Gaussian Process (GP) bandit optimization with a small number of batches. Assuming the unknown function has a low norm in the Reproducing Kernel Hilbert Space (RKHS), we introduce a batch algorithm inspired by batched finite-arm bandit algorithms, and show that it achieves the cumulative regret upper bound $O^\ast(\sqrt{T\gamma_T})$ using $O(\log\log T)$ batches within time horizon $T$, where the $O^\ast(\cdot)$ notation hides dimension-independent logarithmic factors and $\gamma_T$ is the maximum information gain associated with the kernel. This bound is near-optimal for several kernels of interest and improves on the typical $O^\ast(\sqrt{T}\gamma_T)$ bound, and our approach is arguably the simplest among algorithms attaining this improvement. In addition, in the case of a constant number of batches (not depending on $T$), we propose a modified version of our algorithm, and characterize how the regret is impacted by the number of batches, focusing on the squared exponential and Matern kernels. The algorithmic upper bounds are shown to be nearly minimax optimal via analogous algorithm-independent lower bounds. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/li22a.html
  PDF: https://proceedings.mlr.press/v151/li22a/li22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-li22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Zihan
    family: Li
  - given: Jonathan
    family: Scarlett
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 92-107
  id: li22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 92
  lastpage: 107
  published: 2022-05-03 00:00:00 +0000
- title: ' Approximate Function Evaluation via Multi-Armed Bandits '
  abstract: ' We study the problem of estimating the value of a known smooth function f at an unknown point $\mu \in \mathbb{R}^n$, where each component $\mu_i$ can be sampled via a noisy oracle. Sampling more frequently components of $\mu$ corresponding to directions of the function with larger directional derivatives is more sample-efficient. However, as $\mu$ is unknown, the optimal sampling frequencies are also unknown. We design an instance-adaptive algorithm that learns to sample according to the importance of each coordinate, and with probability at least $1-\delta$ returns an $\epsilon$ accurate estimate of $f(\mu)$. We generalize our algorithm to adapt to heteroskedastic noise, and prove asymptotic optimality when f is linear. We corroborate our theoretical results with numerical experiments, showing the dramatic gains afforded by adaptivity. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/baharav22a.html
  PDF: https://proceedings.mlr.press/v151/baharav22a/baharav22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-baharav22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Tavor Z.
    family: Baharav
  - given: Gary
    family: Cheng
  - given: Mert
    family: Pilanci
  - given: David
    family: Tse
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 108-135
  id: baharav22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 108
  lastpage: 135
  published: 2022-05-03 00:00:00 +0000
- title: ' Unlabeled Data Help: Minimax Analysis and Adversarial Robustness '
  abstract: ' The recent proposed self-supervised learning (SSL) approaches successfully demonstrate the great potential of supplementing learning algorithms with additional unlabeled data. However, it is still unclear whether the existing SSL algorithms can fully utilize the information of both labelled and unlabeled data. This paper gives an affirmative answer for the reconstruction-based SSL algorithm (Lee et al., 2020) under several statistical models. While existing literature only focuses on establishing the upper bound of the convergence rate, we provide a rigorous minimax analysis, and successfully justify the rate-optimality of the reconstruction-based SSL algorithm under different data generation models. Furthermore, we incorporate the reconstruction-based SSL into the exist- ing adversarial training algorithms and show that learning from unlabeled data helps improve the robustness. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/xing22a.html
  PDF: https://proceedings.mlr.press/v151/xing22a/xing22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-xing22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yue
    family: Xing
  - given: Qifan
    family: Song
  - given: Guang
    family: Cheng
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 136-168
  id: xing22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 136
  lastpage: 168
  published: 2022-05-03 00:00:00 +0000
- title: ' System-Agnostic Meta-Learning for MDP-based Dynamic Scheduling via Descriptive Policy '
  abstract: ' Dynamic scheduling is an important problem in applications from queuing to wireless networks. It addresses how to choose an item among multiple scheduling items in each timestep to achieve a long-term goal. Most of the conventional approaches for dynamic scheduling find the optimal policy for a given specific system so that the policy from these approaches is usable only for the corresponding system characteristics. Hence, it is hard to use such approaches for a practical system in which system characteristics dynamically change. This paper proposes a novel policy structure for MDP-based dynamic scheduling, a descriptive policy, which has a system-agnostic capability to adapt to unseen system characteristics for an identical task (dynamic scheduling). To this end, the descriptive policy learns a system-agnostic scheduling principle–in a nutshell, “which condition of items should have a higher priority in scheduling”. The scheduling principle can be applied to any system so that the descriptive policy learned in one system can be used for another system. Experiments with simple explanatory and realistic application scenarios demonstrate that it enables system-agnostic meta-learning with very little performance degradation. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/lee22a.html
  PDF: https://proceedings.mlr.press/v151/lee22a/lee22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-lee22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Hyun-Suk
    family: Lee
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 169-187
  id: lee22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 169
  lastpage: 187
  published: 2022-05-03 00:00:00 +0000
- title: ' Deep Layer-wise Networks Have Closed-Form Weights '
  abstract: ' There is currently a debate within the neuroscience community over the likelihood of the brain performing backpropagation (BP). To better mimic the brain, training a network one layer at a time with only a "single forward pass" has been proposed as an alternative to bypass BP; we refer to these networks as "layer-wise" networks. We continue the work on layer-wise networks by answering two outstanding questions. First, do they have a closed-form solution? Second, how do we know when to stop adding more layers? This work proves that the "Kernel Mean Embedding" is the closed-form solution that achieves the network global optimum while driving these networks to converge towards a highly desirable kernel for classification; we call it the Neural Indicator Kernel. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/tzu-wu22a.html
  PDF: https://proceedings.mlr.press/v151/tzu-wu22a/tzu-wu22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-tzu-wu22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Chieh
    family: Tzu Wu
  - given: Aria
    family: Masoomi
  - given: Arthur
    family: Gretton
  - given: Jennifer
    family: Dy
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 188-225
  id: tzu-wu22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 188
  lastpage: 225
  published: 2022-05-03 00:00:00 +0000
- title: ' Sequential Multivariate Change Detection with Calibrated and Memoryless False Detection Rates '
  abstract: ' Responding appropriately to the detections of a sequential change detector requires knowledge of the rate at which false positives occur in the absence of change. Setting detection thresholds to achieve a desired false positive rate is challenging. Existing works resort to setting time-invariant thresholds that focus on the expected runtime of the detector in the absence of change, either bounding it loosely from below or targeting it directly but with asymptotic arguments that we show cause significant miscalibration in practice. We present a simulation-based approach to setting time-varying thresholds that allows a desired expected runtime to be accurately targeted whilst additionally keeping the false positive rate constant across time steps. Whilst the approach to threshold setting is metric agnostic, we show how the cost of using the popular quadratic time MMD estimator can be reduced from $O(N^2B)$ to $O(N^2+NB)$ during configuration and from $O(N^2)$ to $O(N)$ during operation, where $N$ and $B$ are the numbers of reference and bootstrap samples respectively. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/cobb22a.html
  PDF: https://proceedings.mlr.press/v151/cobb22a/cobb22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-cobb22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Oliver
    family: Cobb
  - given: Arnaud
    family: Van Looveren
  - given: Janis
    family: Klaise
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 226-239
  id: cobb22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 226
  lastpage: 239
  published: 2022-05-03 00:00:00 +0000
- title: ' Neural Contextual Bandits without Regret '
  abstract: ' Contextual bandits are a rich model for sequential decision making given side information, with important applications, e.g., in recommender systems. We propose novel algorithms for contextual bandits harnessing neural networks to approximate the unknown reward function. We resolve the open problem of proving sublinear regret bounds in this setting for general context sequences, considering both fully-connected and convolutional networks. To this end, we first analyze NTK-UCB, a kernelized bandit optimization algorithm employing the Neural Tangent Kernel (NTK), and bound its regret in terms of the NTK maximum information gain $\gamma_T$, a complexity parameter capturing the difficulty of learning. Our bounds on $\gamma_T$ for the NTK may be of independent interest. We then introduce our neural network based algorithm NN-UCB, and show that its regret closely tracks that of NTK-UCB. Under broad non-parametric assumptions about the reward function, our approach converges to the optimal policy at a $\tilde{\mathcal{O}}(T^{-1/2d})$ rate, where $d$ is the dimension of the context. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/kassraie22a.html
  PDF: https://proceedings.mlr.press/v151/kassraie22a/kassraie22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-kassraie22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Parnian
    family: Kassraie
  - given: Andreas
    family: Krause
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 240-278
  id: kassraie22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 240
  lastpage: 278
  published: 2022-05-03 00:00:00 +0000
- title: ' SAN: Stochastic Average Newton Algorithm for Minimizing Finite Sums '
  abstract: ' We present a principled approach for designing stochastic Newton methods for solving finite sum optimization problems. Our approach has two steps. First, we re-write the stationarity conditions as a system of nonlinear equations that associates each data point to a new row. Second, we apply a Subsampled Newton Raphson method to solve this system of nonlinear equations. Using our approach, we develop a new Stochastic Average Newton (SAN) method, which is incremental by design, in that it requires only a single data point per iteration. It is also cheap to implement when solving regularized generalized linear models, with a cost per iteration of the order of the number of the parameters. We show through extensive numerical experiments that SAN requires no knowledge about the problem, neither parameter tuning, while remaining competitive as compared to classical variance reduced gradient methods (e.g. SAG and SVRG), incremental Newton and quasi-Newton methods (e.g. SNM, IQN). '
  volume: 151
  URL: https://proceedings.mlr.press/v151/chen22a.html
  PDF: https://proceedings.mlr.press/v151/chen22a/chen22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-chen22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jiabin
    family: Chen
  - given: Rui
    family: Yuan
  - given: Guillaume
    family: Garrigos
  - given: Robert M.
    family: Gower
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 279-318
  id: chen22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 279
  lastpage: 318
  published: 2022-05-03 00:00:00 +0000
- title: ' Factorization Approach for Low-complexity Matrix Completion Problems: Exponential Number of Spurious Solutions and Failure of Gradient Methods '
  abstract: ' Burer-Monteiro (B-M) factorization approach can efficiently solve low-rank matrix optimization problems under the Restricted Isometry Property (RIP) condition. It is natural to ask whether B-M factorization-based methods can succeed on any low-rank matrix optimization problems with low information-theoretic complexity, i.e., polynomial-time solvable problems that have a unique solution. We provide negative answer to this question. We investigate the landscape of B-M factorized polynomial-time solvable matrix completion (MC) problems, which are the most popular subclass of low-rank matrix optimization problems without the RIP condition. We construct an instance of polynomial-time solvable MC problems with exponentially many spurious local minima, which leads to the failure of most gradient-based methods. We define a new complexity metric that measures the solvability of low-rank matrix optimization problems based on B-M factorization approach. In addition, we show that more measurements can deteriorate the landscape, which further reveals the unfavorable behavior of B-M factorization. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/yalcin22a.html
  PDF: https://proceedings.mlr.press/v151/yalcin22a/yalcin22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-yalcin22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Baturalp
    family: Yalçın
  - given: Haixiang
    family: Zhang
  - given: Javad
    family: Lavaei
  - given: Somayeh
    family: Sojoudi
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 319-341
  id: yalcin22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 319
  lastpage: 341
  published: 2022-05-03 00:00:00 +0000
- title: ' k-experts - Online Policies and Fundamental Limits '
  abstract: ' We introduce the k-experts problem - a generalization of the classic Prediction with Expert’s Advice framework. Unlike the classic version, where the learner selects exactly one expert from a pool of N experts at each round, in this problem, the learner selects a subset of k experts at each round (1<= k <= N). The reward obtained by the learner at each round is assumed to be a function of the k selected experts. The primary objective is to design an online learning policy with a small regret. In this pursuit, we propose SAGE (Sampled Hedge) - a framework for designing efficient online learning policies by leveraging statistical sampling techniques. For a wide class of reward functions, we show that SAGE either achieves the first sublinear regret guarantee or improves upon the existing ones. Furthermore, going beyond the notion of regret, we fully characterize the mistake bounds achievable by online learning policies for stable loss functions. We conclude the paper by establishing a tight regret lower bound for a variant of the k-experts problem and carrying out experiments with standard datasets. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/mukhopadhyay22a.html
  PDF: https://proceedings.mlr.press/v151/mukhopadhyay22a/mukhopadhyay22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-mukhopadhyay22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Samrat
    family: Mukhopadhyay
  - given: Sourav
    family: Sahoo
  - given: Abhishek
    family: Sinha
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 342-365
  id: mukhopadhyay22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 342
  lastpage: 365
  published: 2022-05-03 00:00:00 +0000
- title: ' Extragradient Method: O(1/K) Last-Iterate Convergence for Monotone Variational Inequalities and Connections With Cocoercivity '
  abstract: ' Extragradient method (EG) (Korpelevich, 1976) is one of the most popular methods for solving saddle point and variational inequalities problems (VIP). Despite its long history and significant attention in the optimization community, there remain important open questions about convergence of EG. In this paper, we resolve one of such questions and derive the first last-iterate O(1/K) convergence rate for EG for monotone and Lipschitz VIP without any additional assumptions on the operator unlike the only known result of this type (Golowich et al., 2020) that relies on the Lipschitzness of the Jacobian of the operator. The rate is given in terms of reducing the squared norm of the operator. Moreover, we establish several results on the (non-)cocoercivity of the update operators of EG, Optimistic Gradient Method, and Hamiltonian Gradient Method, when the original operator is monotone and Lipschitz. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/gorbunov22a.html
  PDF: https://proceedings.mlr.press/v151/gorbunov22a/gorbunov22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-gorbunov22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Eduard
    family: Gorbunov
  - given: Nicolas
    family: Loizou
  - given: Gauthier
    family: Gidel
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 366-402
  id: gorbunov22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 366
  lastpage: 402
  published: 2022-05-03 00:00:00 +0000
- title: ' Multi-armed Bandit Algorithm against Strategic Replication '
  abstract: ' We consider a multi-armed bandit problem in which a set of arms is registered by each agent, and the agent receives reward when its arm is selected. An agent might strategically submit more arms with replications, which can bring more reward by abusing the bandit algorithm’s exploration-exploitation balance. Our analysis reveals that a standard algorithm indeed fails at preventing replication and suffers from linear regret in time $T$. We aim to design a bandit algorithm which demotivates replications and also achieves a small cumulative regret. We devise Hierarchical UCB (H-UCB) of replication-proof, which has $O(\ln T)$-regret under any equilibrium. We further propose Robust Hierarchical UCB (RH-UCB) which has a sublinear regret even in a realistic scenario with irrational agents replicating careless. We verify our theoretical findings through numerical experiments. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/shin22a.html
  PDF: https://proceedings.mlr.press/v151/shin22a/shin22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-shin22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Suho
    family: Shin
  - given: Seungjoon
    family: Lee
  - given: Jungseul
    family: Ok
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 403-431
  id: shin22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 403
  lastpage: 431
  published: 2022-05-03 00:00:00 +0000
- title: ' Gap-Dependent Bounds for Two-Player Markov Games '
  abstract: ' As one of the most popular methods in the field of reinforcement learning, Q-learning has received increasing attention. Recently, there have been more theoretical works on the regret bound of algorithms that belong to the Q-learning class in different settings. In this paper, we analyze the cumulative regret when conducting Nash Q-learning algorithm on 2-player turn-based stochastic Markov games (2-TBSG), and propose the very first gap dependent logarithmic upper bounds in the episodic tabular setting. This bound matches the theoretical lower bound only up to a logarithmic term. Furthermore, we extend the conclusion to the discounted game setting with infinite horizon and propose a similar gap dependent logarithmic regret bound. Also, under the linear MDP assumption, we obtain another logarithmic regret for 2-TBSG, in both centralized and independent settings. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/dou22a.html
  PDF: https://proceedings.mlr.press/v151/dou22a/dou22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-dou22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Zehao
    family: Dou
  - given: Zhuoran
    family: Yang
  - given: Zhaoran
    family: Wang
  - given: Simon
    family: Du
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 432-455
  id: dou22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 432
  lastpage: 455
  published: 2022-05-03 00:00:00 +0000
- title: ' Exploiting Correlation to Achieve Faster Learning Rates in Low-Rank Preference Bandits '
  abstract: ' We introduce the Correlated Preference Bandits problem with random utility-based choice models (RUMs), where the goal is to identify the best item from a given pool of $n$ items through online subsetwise preference feedback. We investigate whether models with a simple correlation structure, e.g. low rank, can result in faster learning rates. While we show that the problem can be impossible to solve for the general ‘low rank’ choice models, faster learning rates can be attained assuming more structured item correlations. In particular, we introduce a new class of Block-Rank based RUM model, where the best item is shown to be $(\epsilon,\delta)$-PAC learnable with only $O(r \epsilon^{-2} \log(n/\delta))$ samples. This improves on the standard sample complexity bound of $\tilde{O}(n\epsilon^{-2} \log(1/\delta))$ known for the usual learning algorithms which might not exploit the item-correlations ($r \ll n$). We complement the above sample complexity with a matching lower bound (up to logarithmic factors), justifying the tightness of our analysis. Further, we extend the results to a more general noisy Block-Rank model, which ensures robustness of our techniques. Overall, our results justify the advantage of playing subsetwise queries over pairwise preferences $(k=2)$, we show the latter provably fails to exploit correlation. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/saha22a.html
  PDF: https://proceedings.mlr.press/v151/saha22a/saha22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-saha22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Aadirupa
    family: Saha
  - given: Suprovat
    family: Ghoshal
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 456-482
  id: saha22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 456
  lastpage: 482
  published: 2022-05-03 00:00:00 +0000
- title: ' Exploring Image Regions Not Well Encoded by an INN '
  abstract: ' This paper proposes a method to clarify image regions that are not well encoded by an invertible neural network (INN), i.e., image regions that significantly decrease the likelihood of the input image. The proposed method can diagnose the limitation of the representation capacity of an INN. Given an input image, our method extracts image regions, which are not well encoded, by maximizing the likelihood of the image. We explicitly model the distribution of not-well-encoded regions. A metric is proposed to evaluate the extraction of the not-well-encoded regions. Finally, we use the proposed method to analyze several state-of-the-art INNs trained on various benchmark datasets. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ling22a.html
  PDF: https://proceedings.mlr.press/v151/ling22a/ling22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ling22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Zenan
    family: Ling
  - given: Fan
    family: Zhou
  - given: Meng
    family: Wei
  - given: Quanshi
    family: Zhang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 483-509
  id: ling22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 483
  lastpage: 509
  published: 2022-05-03 00:00:00 +0000
- title: ' Finding Dynamics Preserving Adversarial Winning Tickets '
  abstract: ' Modern deep neural networks (DNNs) are vulnerable to adversarial attacks and adversarial training has been shown to be a promising method for improving the adversarial robustness of DNNs. Pruning methods have been considered in adversarial context to reduce model capacity and improve adversarial robustness simultaneously in training. Existing adversarial pruning methods generally mimic the classical pruning methods for natural training, which follow the ’training, pruning, fine-tuning’ three stages pipeline. We observe that such pruning methods do not necessarily preserve the dynamics of dense networks, making it potentially hard to be fine-tuned to compensate the accuracy degradation in pruning. Based on recent works of neural tangent kernel (NTK), we systematically study the dynamics of adversarial training and prove the existence of trainable sparse sub-network at initialization which can be trained to be adversarial robust from scratch. This theoretically verifies the lottery ticket hypothesis in adversarial context and we refer such sub-network structure as adversarial winning ticket (AWT). We also show empirical evidences that AWT preserves the dynamics of adversarial training and achieve equal performance as dense adversarial training. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/shi22a.html
  PDF: https://proceedings.mlr.press/v151/shi22a/shi22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-shi22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Xupeng
    family: Shi
  - given: Pengfei
    family: Zheng
  - given: A.
    family: Adam Ding
  - given: Yuan
    family: Gao
  - given: Weizhong
    family: Zhang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 510-528
  id: shi22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 510
  lastpage: 528
  published: 2022-05-03 00:00:00 +0000
- title: ' Being a Bit Frequentist Improves Bayesian Neural Networks '
  abstract: ' Despite their compelling theoretical properties, Bayesian neural networks (BNNs) tend to perform worse than frequentist methods in classification-based uncertainty quantification (UQ) tasks such as out-of-distribution (OOD) detection. In this paper, based on empirical findings in prior works, we hypothesize that this issue is because even recent Bayesian methods have never considered OOD data in their training processes, even though this “OOD training” technique is an integral part of state-of-the-art frequentist UQ methods. To validate this, we treat OOD data as a first-class citizen in BNN training by exploring four different ways of incorporating OOD data into Bayesian inference. We show in extensive experiments that OOD-trained BNNs are competitive to recent frequentist baselines. This work thus provides strong baselines for future work in Bayesian UQ. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/kristiadi22a.html
  PDF: https://proceedings.mlr.press/v151/kristiadi22a/kristiadi22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-kristiadi22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Agustinus
    family: Kristiadi
  - given: Matthias
    family: Hein
  - given: Philipp
    family: Hennig
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 529-545
  id: kristiadi22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 529
  lastpage: 545
  published: 2022-05-03 00:00:00 +0000
- title: ' Jointly Efficient and Optimal Algorithms for Logistic Bandits '
  abstract: ' Logistic Bandits have recently undergone careful scrutiny by virtue of their combined theoretical and practical relevance. This research effort delivered statistically efficient algorithms, improving the regret of previous strategies by exponentially large factors. Such algorithms are however strikingly costly as they require $\Omega(t)$ operations at each round. On the other hand, a different line of research focused on computational efficiency ($\mathcal{O}(1)$ per-round cost), but at the cost of letting go of the aforementioned exponential improvements. Obtaining the best of both world is unfortunately not a matter of marrying both approaches. Instead we introduce a new learning procedure for Logistic Bandits. It yields confidence sets which sufficient statistics can be easily maintained online without sacrificing statistical tightness. Combined with efficient planning mechanisms we design fast algorithms which regret performance still match the problem-dependent lower-bound of Abeille et al (2021). To the best of our knowledge, those are the first Logistic Bandit algorithms that simultaneously enjoy statistical and computational efficiency. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/faury22a.html
  PDF: https://proceedings.mlr.press/v151/faury22a/faury22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-faury22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Louis
    family: Faury
  - given: Marc
    family: Abeille
  - given: Kwang-Sung
    family: Jun
  - given: Clement
    family: Calauzenes
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 546-580
  id: faury22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 546
  lastpage: 580
  published: 2022-05-03 00:00:00 +0000
- title: ' Obtaining Causal Information by Merging Datasets with MAXENT '
  abstract: ' The investigation of the question "which treatment has a causal effect on a target variable?" is of particular relevance in a large number of scientific disciplines. This challenging task becomes even more difficult if not all treatment variables were or even can not be observed jointly with the target variable. In this paper, we discuss how causal knowledge can be obtained without having observed all variables jointly, but by merging the statistical information from different datasets. We show how the maximum entropy principle can be used to identify edges among random variables when assuming causal sufficiency and an extended version of faithfulness, and when only subsets of the variables have been observed jointly. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/garrido-mejia22a.html
  PDF: https://proceedings.mlr.press/v151/garrido-mejia22a/garrido-mejia22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-garrido-mejia22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Sergio H.
    family: Garrido Mejia
  - given: Elke
    family: Kirschbaum
  - given: Dominik
    family: Janzing
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 581-603
  id: garrido-mejia22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 581
  lastpage: 603
  published: 2022-05-03 00:00:00 +0000
- title: ' Distributed Sparse Multicategory Discriminant Analysis '
  abstract: ' This paper proposes a convex formulation for sparse multicategory linear discriminant analysis and then extend it to the distributed setting when data are stored across multiple sites. The key observation is that for the purpose of classification it suffices to recover the discriminant subspace which is invariant to orthogonal transformations. Theoretically, we establish statistical properties ensuring that the distributed sparse multicategory linear discriminant analysis performs as good as the centralized version after a few rounds of communications. Numerical studies lend strong support to our methodology and theory. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/chen22b.html
  PDF: https://proceedings.mlr.press/v151/chen22b/chen22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-chen22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Hengchao
    family: Chen
  - given: Qiang
    family: Sun
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 604-624
  id: chen22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 604
  lastpage: 624
  published: 2022-05-03 00:00:00 +0000
- title: ' Probabilistic Numerical Method of Lines for Time-Dependent Partial Differential Equations '
  abstract: ' This work develops a class of probabilistic algorithms for the numerical solution of nonlinear, time-dependent partial differential equations (PDEs). Current state-of-the-art PDE solvers treat the space- and time-dimensions separately, serially, and with black-box algorithms, which obscures the interactions between spatial and temporal approximation errors and misguides the quantification of the overall error. To fix this issue, we introduce a probabilistic version of a technique called method of lines. The proposed algorithm begins with a Gaussian process interpretation of finite difference methods, which then interacts naturally with filtering-based probabilistic ordinary differential equation (ODE) solvers because they share a common language: Bayesian inference. Joint quantification of space- and time-uncertainty becomes possible without losing the performance benefits of well-tuned ODE solvers. Thereby, we extend the toolbox of probabilistic programs for differential equation simulation to PDEs. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/kramer22a.html
  PDF: https://proceedings.mlr.press/v151/kramer22a/kramer22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-kramer22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Nicholas
    family: Krämer
  - given: Jonathan
    family: Schmidt
  - given: Philipp
    family: Hennig
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 625-639
  id: kramer22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 625
  lastpage: 639
  published: 2022-05-03 00:00:00 +0000
- title: ' A View of Exact Inference in Graphs from the Degree-4 Sum-of-Squares Hierarchy '
  abstract: ' Performing inference in graphs is a common task within several machine learning problems, e.g., image segmentation, community detection, among others. For a given undirected connected graph, we tackle the statistical problem of exactly recovering an unknown ground-truth binary labeling of the nodes from a single corrupted observation of each edge. Such problem can be formulated as a quadratic combinatorial optimization problem over the boolean hypercube, where it has been shown before that one can (with high probability and in polynomial time) exactly recover the ground-truth labeling of graphs that have an isoperimetric number that grows with respect to the number of nodes (e.g., complete graphs, regular expanders). In this work, we apply a powerful hierarchy of relaxations, known as the sum-of-squares (SoS) hierarchy, to the combinatorial problem. Motivated by empirical evidence on the improvement in exact recoverability, we center our attention on the degree-4 SoS relaxation and set out to understand the origin of such improvement from a graph theoretical perspective. We show that the solution of the dual of the relaxed problem is related to finding edge weights of the Johnson and Kneser graphs, where the weights fulfill the SoS constraints and intuitively allow the input graph to increase its algebraic connectivity. Finally, as byproduct of our analysis, we derive a novel Cheeger-type lower bound for the algebraic connectivity of graphs with signed edge weights. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/bello22a.html
  PDF: https://proceedings.mlr.press/v151/bello22a/bello22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-bello22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Kevin
    family: Bello
  - given: Chuyang
    family: Ke
  - given: Jean
    family: Honorio
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 640-654
  id: bello22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 640
  lastpage: 654
  published: 2022-05-03 00:00:00 +0000
- title: ' Marginalized Operators for Off-policy Reinforcement Learning '
  abstract: ' In this work, we propose marginalized operators, a new class of off-policy evaluation operators for reinforcement learning. Marginalized operators strictly generalize generic multi-step operators, such as Retrace, as special cases. Marginalized operators also suggest a form of sample-based estimates with potential variance reduction, compared to sample-based estimates of the original multi-step operators. We show that the estimates for marginalized operators can be computed in a scalable way, which also generalizes prior results on marginalized importance sampling as special cases. Finally, we empirically demonstrate that marginalized operators provide performance gains to off-policy evaluation problems and downstream policy optimization algorithms. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/tang22a.html
  PDF: https://proceedings.mlr.press/v151/tang22a/tang22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-tang22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yunhao
    family: Tang
  - given: Mark
    family: Rowland
  - given: Remi
    family: Munos
  - given: Michal
    family: Valko
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 655-679
  id: tang22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 655
  lastpage: 679
  published: 2022-05-03 00:00:00 +0000
- title: ' Basis Matters: Better Communication-Efficient Second Order Methods for Federated Learning '
  abstract: ' Recent advances in distributed optimization have shown that Newton-type methods with proper communication compression mechanisms can guarantee fast local rates and low communication cost compared to first order methods. We discover that the communication cost of these methods can be further reduced, sometimes dramatically so, with a surprisingly simple trick: Basis Learn (BL). The idea is to transform the usual representation of the local Hessians via a change of basis in the space of matrices and apply compression tools to the new representation. To demonstrate the potential of using custom bases, we design a new Newton-type method (BL1), which reduces communication cost via both BL technique and bidirectional compression mechanism. Furthermore, we present two alternative extensions (BL2 and BL3) to partial participation to accommodate federated learning applications. We prove local linear and superlinear rates independent of the condition number. Finally, we support our claims with numerical experiments by comparing several first and second order methods. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/qian22a.html
  PDF: https://proceedings.mlr.press/v151/qian22a/qian22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-qian22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Xun
    family: Qian
  - given: Rustem
    family: Islamov
  - given: Mher
    family: Safaryan
  - given: Peter
    family: Richtarik
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 680-720
  id: qian22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 680
  lastpage: 720
  published: 2022-05-03 00:00:00 +0000
- title: ' Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations '
  abstract: ' We perform scalable approximate inference in continuous-depth Bayesian neural networks. In this model class, uncertainty about separate weights in each layer gives hidden units that follow a stochastic differential equation. We demonstrate gradient-based stochastic variational inference in this infinite-parameter setting, producing arbitrarily-flexible approximate posteriors. We also derive a novel gradient estimator that approaches zero variance as the approximate posterior over weights approaches the true posterior. This approach brings continuous-depth Bayesian neural nets to a competitive comparison against discrete-depth alternatives, while inheriting the memory-efficient training and tunable precision of Neural ODEs. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/xu22a.html
  PDF: https://proceedings.mlr.press/v151/xu22a/xu22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-xu22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Winnie
    family: Xu
  - given: Ricky T. Q.
    family: Chen
  - given: Xuechen
    family: Li
  - given: David
    family: Duvenaud
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 721-738
  id: xu22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 721
  lastpage: 738
  published: 2022-05-03 00:00:00 +0000
- title: ' Causally motivated shortcut removal using auxiliary labels '
  abstract: ' Shortcut learning, in which models make use of easy-to-represent but unstable associations, is a major failure mode for robust machine learning. We study a flexible, causally-motivated approach to training robust predictors by discouraging the use of specific shortcuts, focusing on a common setting where a robust predictor could achieve optimal i.i.d generalization in principle, but is overshadowed by a shortcut predictor in practice. Our approach uses auxiliary labels, typically available at training time, to enforce conditional independences implied by the causal graph. We show both theoretically and empirically that causally-motivated regularization schemes (a) lead to more robust estimators that generalize well under distribution shift, and (b) have better finite sample efficiency compared to usual regularization schemes, even when no shortcut is present. Our analysis highlights important theoretical properties of training techniques commonly used in the causal inference, fairness, and disentanglement literatures. Our code is available at github.com/mymakar/causally_motivated_shortcut_removal '
  volume: 151
  URL: https://proceedings.mlr.press/v151/makar22a.html
  PDF: https://proceedings.mlr.press/v151/makar22a/makar22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-makar22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Maggie
    family: Makar
  - given: Ben
    family: Packer
  - given: Dan
    family: Moldovan
  - given: Davis
    family: Blalock
  - given: Yoni
    family: Halpern
  - given: Alexander
    family: D’Amour
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 739-766
  id: makar22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 739
  lastpage: 766
  published: 2022-05-03 00:00:00 +0000
- title: ' Learning to Plan Variable Length Sequences of Actions with a Cascading Bandit Click Model of User Feedback '
  abstract: ' Motivated by problems of ranking with partial information, we introduce a variant of the cascading bandit model that considers flexible length sequences with varying rewards and losses. We formulate two generative models for this problem within the generalized linear setting, and design and analyze upper confidence algorithms for it. Our analysis delivers tight regret bounds which, when specialized to standard cascading bandits, results in sharper guarantees than previously available in the literature. We evaluate our algorithms against a representative sample of cascading bandit baselines on a number of real-world datasets and show significantly improved empirical performance. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/santara22a.html
  PDF: https://proceedings.mlr.press/v151/santara22a/santara22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-santara22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Anirban
    family: Santara
  - given: Gaurav
    family: Aggarwal
  - given: Shuai
    family: Li
  - given: Claudio
    family: Gentile
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 767-797
  id: santara22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 767
  lastpage: 797
  published: 2022-05-03 00:00:00 +0000
- title: ' Identity Testing of Reversible Markov Chains '
  abstract: ' We consider the problem of identity testing of Markov chain transition matrices based on a single trajectory of observations under the distance notion introduced by Daskalakis et al. (2018a) and further analyzed by Cherapanamjeri and Bartlett (2019). Both works made the restrictive assumption that the Markov chains under consideration are symmetric. In this work we relax the symmetry assumption and show that it is possible to perform identity testing under the much weaker assumption of reversibility, provided that the stationary distributions of the reference and of the unknown Markov chains are close under a distance notion related to the separation distance. Additionally, we provide intuition on the distance notion of Daskalakis et al. (2018a) by showing how it behaves under several natural operations. In particular, we address some of their open questions. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/fried22a.html
  PDF: https://proceedings.mlr.press/v151/fried22a/fried22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-fried22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Sela
    family: Fried
  - given: Geoffrey
    family: Wolfer
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 798-817
  id: fried22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 798
  lastpage: 817
  published: 2022-05-03 00:00:00 +0000
- title: ' Sampling in Dirichlet Process Mixture Models for Clustering Streaming Data '
  abstract: ' Practical tools for clustering streaming data must be fast enough to handle the arrival rate of the observations. Typically, they also must adapt on the fly to possible lack of stationarity; i.e., the data statistics may be time-dependent due to various forms of drifts, changes in the number of clusters, etc. The Dirichlet Process Mixture Model (DPMM), whose Bayesian nonparametric nature allows it to adapt its complexity to the data, seems a natural choice for the streaming-data case. In its classical formulation, however, the DPMM cannot capture common types of drifts in the data statistics. Moreover, and regardless of that limitation, existing methods for online DPMM inference are too slow to handle rapid data streams. In this work we propose adapting both the DPMM and a known DPMM sampling-based non-streaming inference method for streaming-data clustering. We demonstrate the utility of the proposed method on several challenging settings, where it obtains state-of-the-art results while being on par with other methods in terms of speed. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/dinari22a.html
  PDF: https://proceedings.mlr.press/v151/dinari22a/dinari22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-dinari22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Or
    family: Dinari
  - given: Oren
    family: Freifeld
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 818-835
  id: dinari22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 818
  lastpage: 835
  published: 2022-05-03 00:00:00 +0000
- title: ' A Globally Convergent Evolutionary Strategy for Stochastic Constrained Optimization with Applications to Reinforcement Learning '
  abstract: ' Evolutionary strategies have recently been shown to achieve competing levels of performance for complex optimization problems in reinforcement learning. In such problems, one often needs to optimize an objective function subject to a set of constraints, including for instance constraints on the entropy of a policy or to restrict the possible set of actions or states accessible to an agent. Convergence guarantees for evolutionary strategies to optimize stochastic constrained problems are however lacking in the literature. In this work, we address this problem by designing a novel optimization algorithm with a sufficient decrease mechanism that ensures convergence and that is based only on estimates of the functions. We demonstrate the applicability of this algorithm on two types of experiments: i) a control task for maximizing rewards and ii) maximizing rewards subject to a non-relaxable set of constraints. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/diouane22a.html
  PDF: https://proceedings.mlr.press/v151/diouane22a/diouane22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-diouane22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Youssef
    family: Diouane
  - given: Aurelien
    family: Lucchi
  - given: Vihang
    family: Prakash Patil
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 836-859
  id: diouane22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 836
  lastpage: 859
  published: 2022-05-03 00:00:00 +0000
- title: ' Adaptively Partitioning Max-Affine Estimators for Convex Regression '
  abstract: ' This paper considers convex shape-restricted nonparametric regression over subgaussian domain and noise with the squared loss. It introduces a tractable convex piecewise-linear estimator which precomputes a partition of the training data by an adaptive version of farthest-point clustering, approximately fits hyperplanes over the partition cells by minimizing the regularized empirical risk, and projects the result into the max-affine class. The analysis provides an upper bound on the generalization error of this estimator matching the rate of Lipschitz nonparametric regression and proves its adaptivity to the intrinsic dimension of the data mitigating the effect of the curse of dimensionality. The experiments conclude with competitive performance, improved overfitting robustness, and significant computational savings compared to existing convex regression methods. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/balazs22a.html
  PDF: https://proceedings.mlr.press/v151/balazs22a/balazs22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-balazs22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Gábor
    family: Balázs
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 860-874
  id: balazs22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 860
  lastpage: 874
  published: 2022-05-03 00:00:00 +0000
- title: ' Variational Marginal Particle Filters '
  abstract: ' Variational inference for state space models (SSMs) is known to be hard in general. Recent works focus on deriving variational objectives for SSMs from unbiased sequential Monte Carlo estimators. We reveal that the marginal particle filter is obtained from sequential Monte Carlo by applying Rao-Blackwellization operations, which sacrifices the trajectory information for reduced variance and differentiability. We propose the variational marginal particle filter (VMPF), which is a differentiable and reparameterizable variational filtering objective for SSMs based on an unbiased estimator. We find that VMPF with biased gradients gives tighter bounds than previous objectives, and the unbiased reparameterization gradients are sometimes beneficial. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/lai22a.html
  PDF: https://proceedings.mlr.press/v151/lai22a/lai22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-lai22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jinlin
    family: Lai
  - given: Justin
    family: Domke
  - given: Daniel
    family: Sheldon
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 875-895
  id: lai22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 875
  lastpage: 895
  published: 2022-05-03 00:00:00 +0000
- title: ' On Structured Filtering-Clustering: Global Error Bound and Optimal First-Order Algorithms '
  abstract: ' The filtering-clustering models, including trend filtering and convex clustering, have become an important source of ideas and modeling tools in machine learning and related fields. The statistical guarantee of optimal solutions in these models has been extensively studied yet the investigations on the computational aspect have remained limited. In particular, practitioners often employ the first-order algorithms in real-world applications and are impressed by their superior performance regardless of ill-conditioned structures of difference operator matrices, thus leaving open the problem of understanding the convergence property of first-order algorithms. This paper settles this open problem and contributes to the broad interplay between statistics and optimization by identifying a global error bound condition, which is satisfied by a large class of dual filtering-clustering problems, and designing a class of generalized dual gradient ascent algorithm, which is optimal first-order algorithms in deterministic, finite-sum and online settings. Our results are new and help explain why the filtering-clustering models can be efficiently solved by first-order algorithms. We also provide the detailed convergence rate analysis for the proposed algorithms in different settings, shedding light on their potential to solve the filtering-clustering models efficiently. We also conduct experiments on real datasets and the numerical results demonstrate the effectiveness of our algorithms. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ho22a.html
  PDF: https://proceedings.mlr.press/v151/ho22a/ho22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ho22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Nhat
    family: Ho
  - given: Tianyi
    family: Lin
  - given: Michael
    family: Jordan
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 896-921
  id: ho22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 896
  lastpage: 921
  published: 2022-05-03 00:00:00 +0000
- title: ' Robustness and Reliability When Training With Noisy Labels '
  abstract: ' Labelling of data for supervised learning can be costly and time-consuming and the risk of incorporating label noise in large data sets is imminent. When training a flexible discriminative model using a strictly proper loss, such noise will inevitably shift the solution towards the conditional distribution over noisy labels. Nevertheless, while deep neural networks have proven capable of fitting random labels, regularisation and the use of robust loss functions empirically mitigate the effects of label noise. However, such observations concern robustness in accuracy, which is insufficient if reliable uncertainty quantification is critical. We demonstrate this by analysing the properties of the conditional distribution over noisy labels for an input-dependent noise model. In addition, we evaluate the set of robust loss functions characterised by noise-insensitive, asymptotic risk minimisers. We find that strictly proper and robust loss functions both offer asymptotic robustness in accuracy, but neither guarantee that the final model is calibrated. Moreover, even with robust loss functions, overfitting is an issue in practice. With these results, we aim to explain observed robustness of common training practices, such as early stopping, to label noise. In addition, we aim to encourage the development of new noise-robust algorithms that not only preserve accuracy but that also ensure reliability. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/olmin22a.html
  PDF: https://proceedings.mlr.press/v151/olmin22a/olmin22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-olmin22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Amanda
    family: Olmin
  - given: Fredrik
    family: Lindsten
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 922-942
  id: olmin22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 922
  lastpage: 942
  published: 2022-05-03 00:00:00 +0000
- title: ' Robust Bayesian Inference for Simulator-based Models via the MMD Posterior Bootstrap '
  abstract: ' Simulator-based models are models for which the likelihood is intractable but simulation of synthetic data is possible. They are often used to describe complex real-world phenomena, and as such can often be misspecified in practice. Unfortunately, existing Bayesian approaches for simulators are known to perform poorly in those cases. In this paper, we propose a novel algorithm based on the posterior bootstrap and maximum mean discrepancy estimators. This leads to a highly-parallelisable Bayesian inference algorithm with strong robustness properties. This is demonstrated through an in-depth theoretical study which includes generalisation bounds and proofs of frequentist consistency and robustness of our posterior. The approach is then assessed on a range of examples including a g-and-k distribution and a toggle-switch model. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/dellaporta22a.html
  PDF: https://proceedings.mlr.press/v151/dellaporta22a/dellaporta22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-dellaporta22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Charita
    family: Dellaporta
  - given: Jeremias
    family: Knoblauch
  - given: Theodoros
    family: Damoulas
  - given: Francois-Xavier
    family: Briol
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 943-970
  id: dellaporta22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 943
  lastpage: 970
  published: 2022-05-03 00:00:00 +0000
- title: ' A Last Switch Dependent Analysis of Satiation and Seasonality in Bandits '
  abstract: ' Motivated by the fact that humans like some level of unpredictability or novelty, and might therefore get quickly bored when interacting with a stationary policy, we introduce a novel non-stationary bandit problem, where the expected reward of an arm is fully determined by the time elapsed since the arm last took part in a switch of actions. Our model generalizes previous notions of delay-dependent rewards, and also relaxes most assumptions on the reward function. This enables the modeling of phenomena such as progressive satiation and periodic behaviours. Building upon the Combinatorial Semi-Bandits (CSB) framework, we design an algorithm and prove a bound on its regret with respect to the optimal non-stationary policy (which is NP-hard to compute). Similarly to previous works, our regret analysis is based on defining and solving an appropriate trade-off between approximation and estimation. Preliminary experiments confirm the superiority of our algorithm over both the oracle greedy approach and a vanilla CSB solver. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/laforgue22a.html
  PDF: https://proceedings.mlr.press/v151/laforgue22a/laforgue22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-laforgue22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Pierre
    family: Laforgue
  - given: Giulia
    family: Clerici
  - given: Nicolò
    family: Cesa-Bianchi
  - given: Ran
    family: Gilad-Bachrach
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 971-990
  id: laforgue22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 971
  lastpage: 990
  published: 2022-05-03 00:00:00 +0000
- title: ' Analysis of a Target-Based Actor-Critic Algorithm with Linear Function Approximation '
  abstract: ' Actor-critic methods integrating target networks have exhibited a stupendous empirical success in deep reinforcement learning. However, a theoretical understanding of the use of target networks in actor-critic methods is largely missing in the literature. In this paper, we reduce this gap between theory and practice by proposing the first theoretical analysis of an online target-based actor-critic algorithm with linear function approximation in the discounted reward setting. Our algorithm uses three different timescales: one for the actor and two for the critic. Instead of using the standard single timescale temporal difference (TD) learning algorithm as a critic, we use a two timescales target-based version of TD learning closely inspired from practical actor-critic algorithms implementing target networks. First, we establish asymptotic convergence results for both the critic and the actor under Markovian sampling. Then, we provide a finite-time analysis showing the impact of incorporating a target network into actor-critic methods. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/barakat22a.html
  PDF: https://proceedings.mlr.press/v151/barakat22a/barakat22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-barakat22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Anas
    family: Barakat
  - given: Pascal
    family: Bianchi
  - given: Julien
    family: Lehmann
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 991-1040
  id: barakat22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 991
  lastpage: 1040
  published: 2022-05-03 00:00:00 +0000
- title: ' Policy Learning and Evaluation with Randomized Quasi-Monte Carlo '
  abstract: ' Hard integrals arise frequently in reinforcement learning, for example when computing expectations in policy evaluation and policy iteration. They are often analytically intractable and typically estimated with Monte Carlo methods, whose sampling contributes to high variance in policy values and gradients. In this work, we propose to replace Monte Carlo samples with low-discrepancy point sets. We combine policy gradient methods with Randomized Quasi-Monte Carlo, yielding variance-reduced formulations of policy gradient and actor-critic algorithms. These formulations are effective for policy evaluation and policy improvement, as they outperform state-of-the-art algorithms on standardized continuous control benchmarks. Our empirical analyses validate the intuition that replacing Monte Carlo with Quasi-Monte Carlo yields significantly more accurate gradient estimates. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/arnold22a.html
  PDF: https://proceedings.mlr.press/v151/arnold22a/arnold22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-arnold22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Sébastien M. R.
    family: Arnold
  - given: Pierre
    family: L’Ecuyer
  - given: Liyu
    family: Chen
  - given: Yi-Fan
    family: Chen
  - given: Fei
    family: Sha
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1041-1061
  id: arnold22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1041
  lastpage: 1061
  published: 2022-05-03 00:00:00 +0000
- title: ' Learning Personalized Item-to-Item Recommendation Metric via Implicit Feedback '
  abstract: ' This paper studies the item-to-item recommendation problem in recommender systems from a new perspective of metric learning via implicit feedback. We develop and investigate a personalizable deep metric model that captures both the internal contents of items and how they were interacted with by users. There are two key challenges in learning such model. First, there is no explicit similarity annotation, which deviates from the assumption of most metric learning methods. Second, these approaches do not account for the fact that items are often represented by multiple sources of meta data and different users use different combinations of these sources to form their own notion of similarity. To address these challenges, we develop a new metric representation embedded as kernel parameters of a probabilistic model. This helps express the correlation between items that a user has interacted with, which can be used to predict user interaction with new items. Our approach hinges on the intuition that similar items induce similar interactions from the same user, thus fitting a metric-parameterized model to predict an implicit feedback signal could indirectly guide it towards finding the most suitable metric for each user. To this end, we also analyze how and when the proposed method is effective from a theoretical lens. Its empirical effectiveness is also demonstrated on several real-world datasets. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/hoang22a.html
  PDF: https://proceedings.mlr.press/v151/hoang22a/hoang22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-hoang22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Nghia
    family: Hoang
  - given: Anoop
    family: Deoras
  - given: Tong
    family: Zhao
  - given: Jin
    family: Li
  - given: George
    family: Karypis
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1062-1077
  id: hoang22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1062
  lastpage: 1077
  published: 2022-05-03 00:00:00 +0000
- title: ' Multiway Spherical Clustering via Degree-Corrected Tensor Block Models '
  abstract: ' We consider the problem of multiway clustering in the presence of unknown degree heterogeneity. Such data problems arise commonly in applications such as recommendation system, neuroimaging, community detection, and hypergraph partitions in social networks. The allowance of degree heterogeneity provides great flexibility in clustering models, but the extra complexity poses significant challenges in both statistics and computation. Here, we develop a degree-corrected tensor block model with estimation accuracy guarantees. We present the phase transition of clustering performance based on the notion of angle separability, and we characterize three signal-to-noise regimes corresponding to different statistical-computational behaviors. In particular, we demonstrate that an intrinsic statistical-to-computational gap emerges only for tensors of order three or greater. Further, we develop an efficient polynomial-time algorithm that provably achieves exact clustering under mild signal conditions. The efficacy of our procedure is demonstrated through both simulations and analyses of Peru Legislation dataset. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/hu22a.html
  PDF: https://proceedings.mlr.press/v151/hu22a/hu22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-hu22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jiaxin
    family: Hu
  - given: Miaoyan
    family: Wang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1078-1119
  id: hu22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1078
  lastpage: 1119
  published: 2022-05-03 00:00:00 +0000
- title: ' Fixed Support Tree-Sliced Wasserstein Barycenter '
  abstract: ' The Wasserstein barycenter has been widely studied in various fields, including natural language processing, and computer vision. However, it requires a high computational cost to solve the Wasserstein barycenter problem because the computation of the Wasserstein distance requires a quadratic time with respect to the number of supports. By contrast, the Wasserstein distance on a tree, called the tree-Wasserstein distance, can be computed in linear time and allows for the fast comparison of a large number of distributions. In this study, we propose a barycenter under the tree-Wasserstein distance, called the fixed support tree-Wasserstein barycenter (FS-TWB) and its extension, called the fixed support tree-sliced Wasserstein barycenter (FS-TSWB). More specifically, we first show that the FS-TWB and FS-TSWB problems are convex optimization problems and can be solved by using the projected subgradient descent. Moreover, we propose a more efficient algorithm to compute the subgradient and objective function value by using the properties of tree-Wasserstein barycenter problems. Through real-world experiments, we show that, by using the proposed algorithm, the FS-TWB and FS-TSWB can be solved two orders of magnitude faster than the original Wasserstein barycenter. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/takezawa22a.html
  PDF: https://proceedings.mlr.press/v151/takezawa22a/takezawa22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-takezawa22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yuki
    family: Takezawa
  - given: Ryoma
    family: Sato
  - given: Zornitsa
    family: Kozareva
  - given: Sujith
    family: Ravi
  - given: Makoto
    family: Yamada
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1120-1137
  id: takezawa22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1120
  lastpage: 1137
  published: 2022-05-03 00:00:00 +0000
- title: ' k-Pareto Optimality-Based Sorting with Maximization of Choice '
  abstract: ' Topological sorting is an important technique in numerous practical applications, such as information retrieval, recommender systems, optimization, etc. In this paper, we introduce a problem of generalized topological sorting with maximization of choice, that is, of choosing a subset of items of a predefined size that contains the maximum number of equally preferable options (items) with respect to a dominance relation. We formulate this problem in a very abstract form and prove that sorting by k-Pareto optimality yields a valid solution. Next, we show that the proposed theory can be useful in practice. We apply it during the selection step of genetic optimization and demonstrate that the resulting algorithm outperforms existing state-of-the-art approaches such as NSGA-II and NSGA-III. We also demonstrate that the provided general formulation allows discovering interesting relationships and applying the developed theory to different applications. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ruppert22a.html
  PDF: https://proceedings.mlr.press/v151/ruppert22a/ruppert22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ruppert22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jean
    family: Ruppert
  - given: Marharyta
    family: Aleksandrova
  - given: Thomas
    family: Engel
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1138-1160
  id: ruppert22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1138
  lastpage: 1160
  published: 2022-05-03 00:00:00 +0000
- title: ' Towards Return Parity in Markov Decision Processes '
  abstract: ' Algorithmic decisions made by machine learning models in high-stakes domains may have lasting impacts over time. However, naive applications of standard fairness criterion in static settings over temporal domains may lead to delayed and adverse effects. To understand the dynamics of performance disparity, we study a fairness problem in Markov decision processes (MDPs). Specifically, we propose return parity, a fairness notion that requires MDPs from different demographic groups that share the same state and action spaces to achieve approximately the same expected time-discounted rewards. We first provide a decomposition theorem for return disparity, which decomposes the return disparity of any two MDPs sharing the same state and action spaces into the distance between group-wise reward functions, the discrepancy of group policies, and the discrepancy between state visitation distributions induced by the group policies. Motivated by our decomposition theorem, we propose algorithms to mitigate return disparity via learning a shared group policy with state visitation distributional alignment using integral probability metrics. We conduct experiments to corroborate our results, showing that the proposed algorithm can successfully close the disparity gap while maintaining the performance of policies on two real-world recommender system benchmark datasets. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/chi22a.html
  PDF: https://proceedings.mlr.press/v151/chi22a/chi22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-chi22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jianfeng
    family: Chi
  - given: Jian
    family: Shen
  - given: Xinyi
    family: Dai
  - given: Weinan
    family: Zhang
  - given: Yuan
    family: Tian
  - given: Han
    family: Zhao
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1161-1178
  id: chi22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1161
  lastpage: 1178
  published: 2022-05-03 00:00:00 +0000
- title: ' Uncertainty Quantification for Low-Rank Matrix Completion with Heterogeneous and Sub-Exponential Noise '
  abstract: ' The problem of low-rank matrix completion with heterogeneous and sub-exponential (as opposed to homogeneous Gaussian) noise is particularly relevant to a number of applications in modern commerce. Examples include panel sales data and data collected from web-commerce systems such as recommendation engines. An important unresolved question for this problem is characterizing the distribution of estimated matrix entries under common low-rank estimators. Such a characterization is essential to any application that requires quantification of uncertainty in these estimates and has heretofore only been available under the assumption of homogenous Gaussian noise. Here we characterize the distribution of estimated matrix entries when the observation noise is heterogeneous sub-Exponential and provide, as an application, explicit formulas for this distribution when observed entries are Poisson or Binary distributed. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/farias22a.html
  PDF: https://proceedings.mlr.press/v151/farias22a/farias22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-farias22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Vivek
    family: Farias
  - given: Andrew A.
    family: Li
  - given: Tianyi
    family: Peng
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1179-1189
  id: farias22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1179
  lastpage: 1189
  published: 2022-05-03 00:00:00 +0000
- title: ' Survival regression with proper scoring rules and monotonic neural networks '
  abstract: ' We consider frequently used scoring rules for right-censored survival regression models such as time-dependent concordance, survival-CRPS, integrated Brier score and integrated binomial log-likelihood, and prove that neither of them is a proper scoring rule. This means that the true survival distribution may be scored worse than incorrect distributions, leading to inaccurate estimation. We prove, in contrast to these scores, that the right-censored log-likelihood is a proper scoring rule, i.e. the highest expected score is achieved by the true distribution. Despite this, modern feed-forward neural-network-based survival regression models are unable to train and validate directly on right-censored log-likelihood, due to its intractability, and resort to the aforementioned alternatives, i.e. non-proper scoring rules. We therefore propose a simple novel survival regression method capable of directly optimizing log-likelihood using a monotonic restriction on the time-dependent weights, coined SurvivalMonotonic-net (SuMo-net). SuMo-net achieves state-of-the-art log-likelihood scores across several datasets with 20–100x computational speedup on inference over existing state-of-the-art neural methods and is readily applicable to datasets with several million observations. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/rindt22a.html
  PDF: https://proceedings.mlr.press/v151/rindt22a/rindt22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-rindt22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: David
    family: Rindt
  - given: Robert
    family: Hu
  - given: David
    family: Steinsaltz
  - given: Dino
    family: Sejdinovic
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1190-1205
  id: rindt22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1190
  lastpage: 1205
  published: 2022-05-03 00:00:00 +0000
- title: ' Physics Informed Deep Kernel Learning '
  abstract: ' Deep kernel learning is a promising combination of deep neural networks and nonparametric function estimation. However, as a data driven approach, the performance of deep kernel learning can still be restricted by scarce or insufficient data, especially in extrapolation tasks. To address these limitations, we propose Physics Informed Deep Kernel Learning (PI-DKL) that exploits physics knowledge represented by differential equations with latent sources. Specifically, we use the posterior function sample of the Gaussian process as the surrogate for the solution of the differential equation, and construct a generative component to integrate the equation in a principled Bayesian hybrid framework. For efficient and effective inference, we marginalize out the latent variables in the joint probability and derive a collapsed model evidence lower bound (ELBO), based on which we develop a stochastic model estimation algorithm. Our ELBO can be viewed as a nice, interpretable posterior regularization objective. On synthetic datasets and real-world applications, we show the advantage of our approach in both prediction accuracy and uncertainty quantification. The code is available at https://github.com/GregDobby/PIDKL. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/wang22a.html
  PDF: https://proceedings.mlr.press/v151/wang22a/wang22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-wang22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Zheng
    family: Wang
  - given: Wei
    family: Xing
  - given: Robert
    family: Kirby
  - given: Shandian
    family: Zhe
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1206-1218
  id: wang22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1206
  lastpage: 1218
  published: 2022-05-03 00:00:00 +0000
- title: ' Fast Distributionally Robust Learning with Variance-Reduced Min-Max Optimization '
  abstract: ' Distributionally robust supervised learning (DRSL) is emerging as a key paradigm for building reliable machine learning systems for real-world applications–reflecting the need for classifiers and predictive models that are robust to the distribution shifts that arise from phenomena such as selection bias or nonstationarity. Existing algorithms for solving Wasserstein DRSL— one of the most popular DRSL frameworks based around robustness to perturbations in the Wasserstein distance—have serious limitations that limit their use in large-scale problems—in particular they involve solving complex subproblems and they fail to make use of stochastic gradients. We revisit Wasserstein DRSL through the lens of min-max optimization and derive scalable and efficiently implementable stochastic extra-gradient algorithms which provably achieve faster convergence rates than existing approaches. We demonstrate their effectiveness on synthetic and real data when compared to existing DRSL approaches. Key to our results is the use of variance reduction and random reshuffling to accelerate stochastic min-max optimization, the analysis of which may be of independent interest. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/yu22a.html
  PDF: https://proceedings.mlr.press/v151/yu22a/yu22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-yu22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yaodong
    family: Yu
  - given: Tianyi
    family: Lin
  - given: Eric V.
    family: Mazumdar
  - given: Michael
    family: Jordan
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1219-1250
  id: yu22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1219
  lastpage: 1250
  published: 2022-05-03 00:00:00 +0000
- title: ' Heavy-tailed Streaming Statistical Estimation '
  abstract: ' We consider the task of heavy-tailed statistical estimation given streaming $p$-dimensional samples. This could also be viewed as stochastic optimization under heavy-tailed distributions, with an additional $O(p)$ space complexity constraint. We design a clipped stochastic gradient descent algorithm and provide an improved analysis, under a more nuanced condition on the noise of the stochastic gradients, which we show is critical when analyzing stochastic optimization problems arising from general statistical estimation problems. Our results guarantee convergence not just in expectation but with exponential concentration, and moreover does so using $O(1)$ batch size. We provide consequences of our results for mean estimation and linear regression. Finally, we provide empirical corroboration of our results and algorithms via synthetic experiments for mean estimation and linear regression. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/tsai22a.html
  PDF: https://proceedings.mlr.press/v151/tsai22a/tsai22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-tsai22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Che-Ping
    family: Tsai
  - given: Adarsh
    family: Prasad
  - given: Sivaraman
    family: Balakrishnan
  - given: Pradeep
    family: Ravikumar
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1251-1282
  id: tsai22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1251
  lastpage: 1282
  published: 2022-05-03 00:00:00 +0000
- title: ' On Distributionally Robust Optimization and Data Rebalancing '
  abstract: ' Machine learning systems based on minimizing average error have been shown to perform inconsistently across notable subsets of the data, which is not exposed by a low average error for the entire dataset. Distributionally Robust Optimization (DRO) seemingly addresses this problem by minimizing the worst expected risk across subpopulations. We establish theoretical results that clarify the relation between DRO and the optimization of the same loss averaged on an adequately weighted training dataset. The results cover finite and infinite number of training distributions, as well as convex and non-convex loss functions. An implication of our results is that for each DRO problem there exists a data distribution such that learning this distribution is equivalent to solving the DRO problem. Yet, important problems that DRO seeks to address (for instance, adversarial robustness and fighting bias) cannot be reduced to finding the one ’unbiased’ dataset. Our discussion section addresses this important discrepancy. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/slowik22a.html
  PDF: https://proceedings.mlr.press/v151/slowik22a/slowik22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-slowik22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Agnieszka
    family: Słowik
  - given: Leon
    family: Bottou
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1283-1297
  id: slowik22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1283
  lastpage: 1297
  published: 2022-05-03 00:00:00 +0000
- title: ' Spiked Covariance Estimation from Modulo-Reduced Measurements '
  abstract: ' Consider the rank-1 spiked model: $\bf{X}=\sqrt{\nu}\xi \bf{u}+ \bf{Z}$, where $\nu$ is the spike intensity, $\bf{u}\in\mathbb{S}^{k-1}$ is an unknown direction and $\xi\sim \mathcal{N}(0,1),\bf{Z}\sim \mathcal{N}(\bf{0},\bf{I})$. Motivated by recent advances in analog-to-digital conversion, we study the problem of recovering $\bf{u}\in \mathbb{S}^{k-1}$ from $n$ i.i.d. modulo-reduced measurements $\bf{Y}=[\bf{X}]\mod \Delta$, focusing on the high-dimensional regime ($k\gg 1$). We develop and analyze an algorithm that, for most directions $\bf{u}$ and $\nu=\mathrm{poly}(k)$, estimates $\bf{u}$ to high accuracy using $n=\mathrm{poly}(k)$ measurements, provided that $\Delta\gtrsim \sqrt{\log k}$. Up to constants, our algorithm accurately estimates $\bf{u}$ at the smallest possible $\Delta$ that allows (in an information-theoretic sense) to recover $\bf{X}$ from $\bf{Y}$. A key step in our analysis involves estimating the probability that a line segment of length $\approx\sqrt{\nu}$ in a random direction $\bf{u}$ passes near a point in the lattice $\Delta \mathbb{Z}^k$. Numerical experiments show that the developed algorithm performs well even in a non-asymptotic setting. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/romanov22a.html
  PDF: https://proceedings.mlr.press/v151/romanov22a/romanov22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-romanov22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Elad
    family: Romanov
  - given: Or
    family: Ordentlich
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1298-1320
  id: romanov22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1298
  lastpage: 1320
  published: 2022-05-03 00:00:00 +0000
- title: ' A Contraction Theory Approach to Optimization Algorithms from Acceleration Flows '
  abstract: ' Much recent interest has focused on the design of optimization algorithms from the discretization of an associated optimization flow, i.e., a system of differential equations (ODEs) whose trajectories solve an associated optimization problem. Such a design approach poses an important problem: how to find a principled methodology to design and discretize appropriate ODEs. This paper aims to provide a solution to this problem through the use of contraction theory. We first introduce general mathematical results that explain how contraction theory guarantees the stability of the implicit and explicit Euler integration methods. Then, we propose a novel system of ODEs, namely the Accelerated-Contracting-Nesterov flow, and use contraction theory to establish it is an optimization flow with exponential convergence rate, from which the linear convergence rate of its associated optimization algorithm is immediately established. Remarkably, a simple explicit Euler discretization of this flow corresponds to the Nesterov acceleration method. Finally, we present how our approach leads to performance guarantees in the design of optimization algorithms for time-varying optimization problems. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/cisneros-velarde22a.html
  PDF: https://proceedings.mlr.press/v151/cisneros-velarde22a/cisneros-velarde22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-cisneros-velarde22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Pedro
    family: Cisneros-Velarde
  - given: Francesco
    family: Bullo
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1321-1335
  id: cisneros-velarde22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1321
  lastpage: 1335
  published: 2022-05-03 00:00:00 +0000
- title: ' Robust Probabilistic Time Series Forecasting '
  abstract: ' Probabilistic time series forecasting has played critical role in decision-making processes due to its capability to quantify uncertainties. Deep forecasting models, however, could be prone to input perturbations, and the notion of such perturbations, together with that of robustness, has not even been completely established in the regime of probabilistic forecasting. In this work, we propose a framework for robust probabilistic time series forecasting. First, we generalize the concept of adversarial input perturbations, based on which we formulate the concept of robustness in terms of bounded Wasserstein deviation. Then we extend the randomized smoothing technique to attain robust probabilistic forecasters with theoretical robustness certificates against certain classes of adversarial perturbations. Lastly, extensive experiments demonstrate that our methods are empirically effective in enhancing the forecast quality under additive adversarial attacks and forecast consistency under supplement of noisy observations. The code for our experiments is available at https://github.com/tetrzim/robust-probabilistic-forecasting. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/yoon22a.html
  PDF: https://proceedings.mlr.press/v151/yoon22a/yoon22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-yoon22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Taeho
    family: Yoon
  - given: Youngsuk
    family: Park
  - given: Ernest K.
    family: Ryu
  - given: Yuyang
    family: Wang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1336-1358
  id: yoon22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1336
  lastpage: 1358
  published: 2022-05-03 00:00:00 +0000
- title: ' VFDS: Variational Foresight Dynamic Selection in Bayesian Neural Networks for Efficient Human Activity Recognition '
  abstract: ' In many machine learning tasks, input features with varying degrees of predictive capability are acquired at varying costs. In order to optimize the performance-cost trade-off, one would select features to observe a priori. However, given the changing context with previous observations, the subset of predictive features to select may change dynamically. Therefore, we face the challenging new problem of foresight dynamic selection (FDS): finding a dynamic and light-weight policy to decide which features to observe next, before actually observing them, for overall performance-cost trade-offs. To tackle FDS, this paper proposes a Bayesian learning framework of Variational Foresight Dynamic Selection (VFDS). VFDS learns a policy that selects the next feature subset to observe, by optimizing a variational Bayesian objective that characterizes the trade-off between model performance and feature cost. At its core is an implicit variational distribution on binary gates that are dependent on previous observations, which will select the next subset of features to observe. We apply VFDS on the Human Activity Recognition (HAR) task where the performance-cost trade-off is critical in its practice. Extensive results demonstrate that VFDS selects different features under changing contexts, notably saving sensory costs while maintaining or improving the HAR accuracy. Moreover, the features that VFDS dynamically select are shown to be interpretable and associated with the different activity types. We will release the code. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ardywibowo22a.html
  PDF: https://proceedings.mlr.press/v151/ardywibowo22a/ardywibowo22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ardywibowo22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Randy
    family: Ardywibowo
  - given: Shahin
    family: Boluki
  - given: Zhangyang
    family: Wang
  - given: Bobak J.
    family: Mortazavi
  - given: Shuai
    family: Huang
  - given: Xiaoning
    family: Qian
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1359-1379
  id: ardywibowo22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1359
  lastpage: 1379
  published: 2022-05-03 00:00:00 +0000
- title: ' Implicitly Regularized RL with Implicit Q-values '
  abstract: ' The $Q$-function is a central quantity in many Reinforcement Learning (RL) algorithms for which RL agents behave following a (soft)-greedy policy w.r.t. to $Q$. It is a powerful tool that allows action selection without a model of the environment and even without explicitly modeling the policy. Yet, this scheme can only be used in discrete action tasks, with small numbers of actions, as the softmax over actions cannot be computed exactly otherwise. More specifically, the usage of function approximation to deal with continuous action spaces in modern actor-critic architectures intrinsically prevents the exact computation of a softmax. We propose to alleviate this issue by parametrizing the $Q$-function implicitly, as the sum of a log-policy and a value function. We use the resulting parametrization to derive a practical off-policy deep RL algorithm, suitable for large action spaces, and that enforces the softmax relation between the policy and the $Q$-value. We provide a theoretical analysis of our algorithm: from an Approximate Dynamic Programming perspective, we show its equivalence to a regularized version of value iteration, accounting for both entropy and Kullback-Leibler regularization, and that enjoys beneficial error propagation results. We then evaluate our algorithm on classic control tasks, where its results compete with state-of-the-art methods. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/vieillard22a.html
  PDF: https://proceedings.mlr.press/v151/vieillard22a/vieillard22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-vieillard22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Nino
    family: Vieillard
  - given: Marcin
    family: Andrychowicz
  - given: Anton
    family: Raichuk
  - given: Olivier
    family: Pietquin
  - given: Matthieu
    family: Geist
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1380-1402
  id: vieillard22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1380
  lastpage: 1402
  published: 2022-05-03 00:00:00 +0000
- title: ' A Witness Two-Sample Test '
  abstract: ' The Maximum Mean Discrepancy (MMD) has been the state-of-the-art nonparametric test for tackling the two-sample problem. Its statistic is given by the difference in expectations of the witness function, a real-valued function defined as a weighted sum of kernel evaluations on a set of basis points. Typically the kernel is optimized on a training set, and hypothesis testing is performed on a separate test set to avoid overfitting (i.e., control type-I error). That is, the test set is used to simultaneously estimate the expectations and define the basis points, while the training set only serves to select the kernel and is discarded. In this work, we propose to use the training data to also define the weights and the basis points for better data efficiency. We show that 1) the new test is consistent and has a well-controlled type-I error; 2) the optimal witness function is given by a precision-weighted mean in the reproducing kernel Hilbert space associated with the kernel; and 3) the test power of the proposed test is comparable or exceeds that of the MMD and other modern tests, as verified empirically on challenging synthetic and real problems (e.g., Higgs data). '
  volume: 151
  URL: https://proceedings.mlr.press/v151/kubler22a.html
  PDF: https://proceedings.mlr.press/v151/kubler22a/kubler22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-kubler22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jonas M.
    family: Kübler
  - given: Wittawat
    family: Jitkrittum
  - given: Bernhard
    family: Schölkopf
  - given: Krikamol
    family: Muandet
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1403-1419
  id: kubler22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1403
  lastpage: 1419
  published: 2022-05-03 00:00:00 +0000
- title: ' Sample-and-threshold differential privacy: Histograms and applications '
  abstract: ' Federated analytics seeks to compute accurate statistics from data distributed across users’ devices while providing a suitable privacy guarantee and being practically feasible to implement and scale. In this paper, we show how a strong (epsilon, delta)-Differential Privacy (DP) guarantee can be achieved for the fundamental problem of histogram generation in a federated setting, via a highly practical sampling-based procedure that does not add noise to disclosed data. Given the ubiquity of sampling in practice, we thus obtain a DP guarantee almost for free, avoid over-estimating histogram counts, and allow easy reasoning about how privacy guarantees may obscure minorities and outliers. Using such histograms, related problems such as heavy hitters and quantiles can be answered with provable error and privacy guarantees. Experimental results show that our sample-and-threshold approach is accurate and scalable. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/cormode22a.html
  PDF: https://proceedings.mlr.press/v151/cormode22a/cormode22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-cormode22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Graham
    family: Cormode
  - given: Akash
    family: Bharadwaj
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1420-1431
  id: cormode22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1420
  lastpage: 1431
  published: 2022-05-03 00:00:00 +0000
- title: ' Common Failure Modes of Subcluster-based Sampling in Dirichlet Process Gaussian Mixture Models - and a Deep-learning Solution '
  abstract: ' The Dirichlet Process Gaussian Mixture Model (DPGMM) is often used to cluster data when the number of clusters is unknown. One main DPGMM inference paradigm relies on sampling. Here we consider a known state-of-art sampler (proposed by Chang and Fisher III (2013) and improved by Dinari et al. (2019)), analyze its failure modes, and show how to improve it, often drastically. Concretely, in that sampler, whenever a new cluster is formed it is augmented with two subclusters whose labels are initialized at random. Upon their evolution, the subclusters serve to propose a split of the parent cluster. We show that the random initialization is often problematic and hurts the otherwise-effective sampler. Specifically, we demonstrate that this initialization tends to lead to poor split proposals and/or too many iterations before a desired split is accepted. This slows convergence and can damage the clustering. As a remedy, we propose two drop-in-replacement options for the subcluster-initialization subroutine. The first is an intuitive heuristic while the second is based on deep learning. We show that the proposed approach yields better splits, which in turn translate to substantial improvements in performance, results, and stability. Our code is publicly available. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/winter22a.html
  PDF: https://proceedings.mlr.press/v151/winter22a/winter22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-winter22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Vlad
    family: Winter
  - given: Or
    family: Dinari
  - given: Oren
    family: Freifeld
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1432-1456
  id: winter22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1432
  lastpage: 1456
  published: 2022-05-03 00:00:00 +0000
- title: ' A Complete Characterisation of ReLU-Invariant Distributions '
  abstract: ' We give a complete characterisation of families of probability distributions that are invariant under the action of ReLU neural network layers (in the same way that the family of Gaussian distributions is invariant to affine linear transformations). The need for such families arises during the training of Bayesian networks or the analysis of trained neural networks, e.g., in the context of uncertainty quantification (UQ) or explainable artificial intelligence (XAI). We prove that no invariant parametrised family of distributions can exist unless at least one of the following three restrictions holds: First, the network layers have a width of one, which is unreasonable for practical neural networks. Second, the probability measures in the family have finite support, which basically amounts to sampling distributions. Third, the parametrisation of the family is not locally Lipschitz continuous, which excludes all computationally feasible families. Finally, we show that these restrictions are individually necessary. For each of the three cases we can construct an invariant family exploiting exactly one of the restrictions but not the other two. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/macdonald22a.html
  PDF: https://proceedings.mlr.press/v151/macdonald22a/macdonald22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-macdonald22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jan
    family: Macdonald
  - given: Stephan
    family: Wäldchen
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1457-1484
  id: macdonald22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1457
  lastpage: 1484
  published: 2022-05-03 00:00:00 +0000
- title: ' Threading the Needle of On and Off-Manifold Value Functions for Shapley Explanations '
  abstract: ' A popular explainable AI (XAI) approach to quantify feature importance of a given model is via Shapley values. These Shapley values arose in cooperative games, and hence a critical ingredient to compute these in an XAI context is a so-called value function, that computes the “value” of a subset of features, and which connects machine learning models to cooperative games. There are many possible choices for such value functions, which broadly fall into two categories: on-manifold and off-manifold value functions, which take an observational and an interventional viewpoint respectively. Both these classes however have their respective flaws, where on-manifold value functions violate key axiomatic properties and are computationally expensive, while off-manifold value functions pay less heed to the data manifold and evaluate the model on regions for which it wasn’t trained. Thus, there is no consensus on which class of value functions to use. In this paper, we show that in addition to these existing issues, both classes of value functions are prone to adversarial manipulations on low density regions. We formalize the desiderata of value functions that respect both the model and the data manifold in a set of axioms and are robust to perturbation on off-manifold regions, and show that there exists a unique value function that satisfies these axioms, which we term the Joint Baseline value function, and the resulting Shapley value the Joint Baseline Shapley (JBshap), and validate the effectiveness of JBshap in experiments. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/yeh22a.html
  PDF: https://proceedings.mlr.press/v151/yeh22a/yeh22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-yeh22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Chih-Kuan
    family: Yeh
  - given: Kuan-Yun
    family: Lee
  - given: Frederick
    family: Liu
  - given: Pradeep
    family: Ravikumar
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1485-1502
  id: yeh22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1485
  lastpage: 1502
  published: 2022-05-03 00:00:00 +0000
- title: ' Discovering Inductive Bias with Gibbs Priors: A Diagnostic Tool for Approximate Bayesian Inference '
  abstract: ' Full Bayesian posteriors are rarely analytically tractable, which is why real-world Bayesian inference heavily relies on approximate techniques. Approximations generally differ from the true posterior and require diagnostic tools to assess whether the inference can still be trusted. We investigate a new approach to diagnosing approximate inference: the approximation mismatch is attributed to a change in the inductive bias by treating the approximations as exact and reverse-engineering the corresponding prior. We show that the problem is more complicated than it appears to be at first glance, because the solution generally depends on the observation. By reframing the problem in terms of incompatible conditional distributions we arrive at a natural solution: the Gibbs prior. The resulting diagnostic is based on pseudo-Gibbs sampling, which is widely applicable and easy to implement. We illustrate how the Gibbs prior can be used to discover the inductive bias in a controlled Gaussian setting and for a variety of Bayesian models and approximations. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/rendsburg22a.html
  PDF: https://proceedings.mlr.press/v151/rendsburg22a/rendsburg22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-rendsburg22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Luca
    family: Rendsburg
  - given: Agustinus
    family: Kristiadi
  - given: Philipp
    family: Hennig
  - given: Ulrike
    family: Von Luxburg
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1503-1526
  id: rendsburg22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1503
  lastpage: 1526
  published: 2022-05-03 00:00:00 +0000
- title: ' Equivariance Discovery by Learned Parameter-Sharing '
  abstract: ' Designing equivariance as an inductive bias into deep-nets has been a prominent approach to build effective models, e.g., a convolutional neural network incorporates translation equivariance. However, incorporating these inductive biases requires knowledge about the equivariance properties of the data, which may not be available, e.g., when encountering a new domain. To address this, we study how to "discover interpretable equivariances" from data. Specifically, we formulate this discovery process as an optimization problem over a model’s parameter-sharing schemes. We propose to use the partition distance to empirically quantify the accuracy of the recovered equivariance. Also, we theoretically analyze the method for Gaussian data and provide a bound on the mean squared gap between the studied discovery scheme and the oracle scheme. Empirically, we show that the approach recovers known equivariances, such as permutations and shifts, on sum of numbers and spatially-invariant data. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/yeh22b.html
  PDF: https://proceedings.mlr.press/v151/yeh22b/yeh22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-yeh22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Raymond A.
    family: Yeh
  - given: Yuan-Ting
    family: Hu
  - given: Mark
    family: Hasegawa-Johnson
  - given: Alexander
    family: Schwing
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1527-1545
  id: yeh22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1527
  lastpage: 1545
  published: 2022-05-03 00:00:00 +0000
- title: ' Optimal Rates of (Locally) Differentially Private Heavy-tailed Multi-Armed Bandits '
  abstract: ' In this paper we investigate the problem of stochastic multi-armed bandits (MAB) in the (local) differential privacy (DP/LDP) model. Unlike previous results that assume bounded/sub-Gaussian reward distributions, we focus on the setting where each arm’s reward distribution only has $(1+v)$-th moment with some $v\in (0, 1]$. In the first part, we study the problem in the central $\epsilon$-DP model. We first provide a near-optimal result by developing a private and robust Upper Confidence Bound (UCB) algorithm. Then, we improve the result via a private and robust version of the Successive Elimination (SE) algorithm. Finally, we establish the lower bound to show that the instance-dependent regret of our improved algorithm is optimal. In the second part, we study the problem in the $\epsilon$-LDP model. We propose an algorithm that can be seen as locally private and robust version of SE algorithm, which provably achieves (near) optimal rates for both instance-dependent and instance-independent regret. Our results reveal differences between the problem of private MAB with bounded/sub-Gaussian rewards and heavy-tailed rewards. To achieve these (near) optimal rates, we develop several new hard instances and private robust estimators as byproducts, which might be used to other related problems. Finally, experiments also support our theoretical findings and show the effectiveness of our algorithms. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/tao22a.html
  PDF: https://proceedings.mlr.press/v151/tao22a/tao22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-tao22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Youming
    family: Tao
  - given: Yulian
    family: Wu
  - given: Peng
    family: Zhao
  - given: Di
    family: Wang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1546-1574
  id: tao22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1546
  lastpage: 1574
  published: 2022-05-03 00:00:00 +0000
- title: ' Standardisation-function Kernel Stein Discrepancy: A Unifying View on Kernel Stein Discrepancy Tests for Goodness-of-fit '
  abstract: ' Non-parametric goodness-of-fit testing procedures based on kernel Stein discrepancies (KSD) are promising approaches to validate general unnormalised distributions in various scenarios. Existing works focused on studying kernel choices to boost test performances. However, the choices of (non-unique) Stein operators also have considerable effect on the test performances. Inspired by the standardisation technique that was originally developed to better derive approximation properties for normal distributions, we present a unifying framework, called standardisation-function kernel Stein discrepancy (Sf-KSD), to study different Stein operators in KSD-based tests for goodness-of-fit. We derive explicitly how the proposed framework relates to existing KSD-based tests and show that Sf-KSD can be used as a guide to develop novel kernel-based non-parametric tests on complex data scenarios, e.g. truncated distributions or compositional data. Experimental results demonstrate that the proposed tests control type-I error well and achieve higher test power than existing approaches. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/xu22b.html
  PDF: https://proceedings.mlr.press/v151/xu22b/xu22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-xu22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Wenkai
    family: Xu
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1575-1597
  id: xu22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1575
  lastpage: 1597
  published: 2022-05-03 00:00:00 +0000
- title: ' Parametric Bootstrap for Differentially Private Confidence Intervals '
  abstract: ' The goal of this paper is to develop a practical and general-purpose approach to construct confidence intervals for differentially private parametric estimation. We find that the parametric bootstrap is a simple and effective solution. It cleanly reasons about variability of both the data sample and the randomized privacy mechanism and applies "out of the box" to a wide class of private estimation routines. It can also help correct bias caused by clipping data to limit sensitivity. We prove that the parametric bootstrap gives consistent confidence intervals in two broadly relevant settings, including a novel adaptation to linear regression that avoids accessing the covariate data multiple times. We demonstrate its effectiveness for a variety of estimators, and find empirically that it provides confidence intervals with good coverage even at modest sample sizes and performs better than alternative approaches. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ferrando22a.html
  PDF: https://proceedings.mlr.press/v151/ferrando22a/ferrando22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ferrando22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Cecilia
    family: Ferrando
  - given: Shufan
    family: Wang
  - given: Daniel
    family: Sheldon
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1598-1618
  id: ferrando22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1598
  lastpage: 1618
  published: 2022-05-03 00:00:00 +0000
- title: ' Nearly Tight Convergence Bounds for Semi-discrete Entropic Optimal Transport '
  abstract: ' We derive nearly tight and non-asymptotic convergence bounds for solutions of entropic semi-discrete optimal transport. These bounds quantify the stability of the dual solutions of the regularized problem (sometimes called Sinkhorn potentials) w.r.t. the regularization parameter, for which we ensure a better than Lipschitz dependence. Such facts may be a first step towards a mathematical justification of $\varepsilon$-scaling heuristics for the numerical resolution of regularized semi-discrete optimal transport. Our results also entail a non-asymptotic and tight expansion of the difference between the entropic and the unregularized costs. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/delalande22a.html
  PDF: https://proceedings.mlr.press/v151/delalande22a/delalande22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-delalande22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Alex
    family: Delalande
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1619-1642
  id: delalande22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1619
  lastpage: 1642
  published: 2022-05-03 00:00:00 +0000
- title: ' Deep Generative model with Hierarchical Latent Factors for Time Series Anomaly Detection '
  abstract: ' Multivariate time series anomaly detection has become an active area of research in recent years, with Deep Learning models outperforming previous approaches on benchmark datasets. Among reconstruction-based models, most previous work has focused on Variational Autoencoders and Generative Adversarial Networks. This work presents DGHL, a new family of generative models for time series anomaly detection, trained by maximizing the observed likelihood by posterior sampling and alternating back-propagation. A top-down Convolution Network maps a novel hierarchical latent space to time series windows, exploiting temporal dynamics to encode information efficiently. Despite relying on posterior sampling, it is computationally more efficient than current approaches, with up to 10x shorter training times than RNN based models. Our method outperformed current state-of-the-art models on four popular benchmark datasets. Finally, DGHL is robust to variable features between entities and accurate even with large proportions of missing values, settings with increasing relevance with the advent of IoT. We demonstrate the superior robustness of DGHL with novel occlusion experiments in this literature. Our code is available at https://github.com/cchallu/dghl. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/challu22a.html
  PDF: https://proceedings.mlr.press/v151/challu22a/challu22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-challu22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Cristian I.
    family: Challu
  - given: Peihong
    family: Jiang
  - given: Ying
    family: Nian Wu
  - given: Laurent
    family: Callot
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1643-1654
  id: challu22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1643
  lastpage: 1654
  published: 2022-05-03 00:00:00 +0000
- title: ' Online Page Migration with ML Advice '
  abstract: ' We consider online algorithms for the page migration problem that use predictions, potentially imperfect, to improve their performance. The best known online algorithms for this problem, due to Westbrook’94 and Bienkowski et al’17, have competitive ratios strictly bounded away from 1. In contrast, we show that if the algorithm is given a prediction of the input sequence, then it can achieve a competitive ratio that tends to $1$ as the prediction error rate tends to $0$. Specifically, the competitive ratio is equal to $1+O(q)$, where $q$ is the prediction error rate. We also design a “fallback option” that ensures that the competitive ratio of the algorithm for any input sequence is at most $O(1/q)$. Our result adds to the recent body of work that uses machine learning to improve the performance of “classic” algorithms. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/indyk22a.html
  PDF: https://proceedings.mlr.press/v151/indyk22a/indyk22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-indyk22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Piotr
    family: Indyk
  - given: Frederik
    family: Mallmann-Trenn
  - given: Slobodan
    family: Mitrovic
  - given: Ronitt
    family: Rubinfeld
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1655-1670
  id: indyk22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1655
  lastpage: 1670
  published: 2022-05-03 00:00:00 +0000
- title: ' Policy Learning for Optimal Individualized Dose Intervals '
  abstract: ' We study the problem of learning individualized dose intervals using observational data. There are very few previous works for policy learning with continuous treatment, and all of them focused on recommending an optimal dose rather than an optimal dose interval. In this paper, we propose a new method to estimate such an optimal dose interval, named probability dose interval (PDI). The potential outcomes for doses in the PDI are guaranteed better than a pre-specified threshold with a given probability (e.g., $50%$). The associated nonconvex optimization problem can be efficiently solved by the Difference-of-Convex functions (DC) algorithm. We prove that our estimated policy is consistent, and its risk converges to that of the best-in-class policy at a root-n rate. Numerical simulations show the advantage of the proposed method over outcome modeling based benchmarks. We further demonstrate the performance of our method in determining individualized Hemoglobin A1c (HbA1c) control intervals for elderly patients with diabetes. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/chen22c.html
  PDF: https://proceedings.mlr.press/v151/chen22c/chen22c.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-chen22c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Guanhua
    family: Chen
  - given: Xiaomao
    family: Li
  - given: Menggang
    family: Yu
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1671-1693
  id: chen22c
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1671
  lastpage: 1693
  published: 2022-05-03 00:00:00 +0000
- title: ' Deep Multi-Fidelity Active Learning of High-Dimensional Outputs '
  abstract: ' Many applications, such as in physical simulation and engineering design, demand we estimate functions with high-dimensional outputs. To reduce the expensive cost of generating training examples, we usually choose several fidelities to enable a cost/quality trade-off. In this paper, we consider the active learning task to automatically identify the fidelities and training inputs to query new examples so as to achieve the best learning benefit-cost ratio. To this end, we propose DMFAL, a Deep Multi-Fidelity Active Learning approach. We first develop a deep neural network-based multi-fidelity model for high-dimensional outputs, which can flexibly capture strong complex correlations across the outputs and fidelities to enhance the learning of the target function. We then propose a mutual information based acquisition function that extends the predictive entropy principle. To overcome the computational challenges caused by large output dimensions, we use the multi-variate delta method and moment-matching to estimate the output posterior, and Weinstein-Aronszajn identity to calculate and optimize the acquisition function. We show the advantage of our method in several applications of computational physics and engineering design. The code is available at https://github.com/shib0li/DMFAL. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/li22b.html
  PDF: https://proceedings.mlr.press/v151/li22b/li22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-li22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Shibo
    family: Li
  - given: Zheng
    family: Wang
  - given: Robert
    family: Kirby
  - given: Shandian
    family: Zhe
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1694-1711
  id: li22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1694
  lastpage: 1711
  published: 2022-05-03 00:00:00 +0000
- title: ' Online Learning for Unknown Partially Observable MDPs '
  abstract: ' Solving Partially Observable Markov Decision Processes (POMDPs) is hard. Learning optimal controllers for POMDPs when the model is unknown is harder. Online learning of optimal controllers for unknown POMDPs, which requires efficient learning using regret-minimizing algorithms that effectively tradeoff exploration and exploitation, is even harder, and no solution exists currently. In this paper, we consider infinite-horizon average-cost POMDPs with unknown transition model, though a known observation model. We propose a natural posterior sampling-based reinforcement learning algorithm (PSRL-POMDP) and show that it achieves a regret bound of $O(\log T)$, where $T$ is the time horizon, when the parameter set is finite. In the general case (continuous parameter set), we show that the algorithm achieves $O(T^{2/3})$ regret under two technical assumptions. To the best of our knowledge, this is the first online RL algorithm for POMDPs and has sub-linear regret. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/jafarnia-jahromi22a.html
  PDF: https://proceedings.mlr.press/v151/jafarnia-jahromi22a/jafarnia-jahromi22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-jafarnia-jahromi22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Mehdi
    family: Jafarnia Jahromi
  - given: Rahul
    family: Jain
  - given: Ashutosh
    family: Nayyar
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1712-1732
  id: jafarnia-jahromi22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1712
  lastpage: 1732
  published: 2022-05-03 00:00:00 +0000
- title: ' Is Bayesian Model-Agnostic Meta Learning Better than Model-Agnostic Meta Learning, Provably? '
  abstract: ' Meta learning aims at learning a model that can quickly adapt to unseen tasks. Widely used meta learning methods include model agnostic meta learning (MAML), implicit MAML, Bayesian MAML. Thanks to its ability of modeling uncertainty, Bayesian MAML often has advantageous empirical performance. However, the theoretical understanding of Bayesian MAML is still limited, especially on questions such as if and when Bayesian MAML has provably better performance than MAML. In this paper, we aim to provide theoretical justifications for Bayesian MAML’s advantageous performance by comparing the meta test risks of MAML and Bayesian MAML. In the meta linear regression, under both the distribution agnostic and linear centroid cases, we have established that Bayesian MAML indeed has provably lower meta test risks than MAML. We verify our theoretical results through experiments, the code of which is available at https://github.com/lishachen/Bayesian-MAML-vs-MAML. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/chen22d.html
  PDF: https://proceedings.mlr.press/v151/chen22d/chen22d.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-chen22d.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Lisha
    family: Chen
  - given: Tianyi
    family: Chen
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1733-1774
  id: chen22d
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1733
  lastpage: 1774
  published: 2022-05-03 00:00:00 +0000
- title: ' A Bayesian Model for Online Activity Sample Sizes '
  abstract: ' In many contexts it is useful to predict the number of individuals in some population who will initiate a particular activity during a given period. For example, the number of users who will install a software update, the number of customers who will use a new feature on a website or who will participate in an A/B test. In practical settings, there is heterogeneity amongst individuals with regard to the distribution of time until they will initiate. For these reasons it is inappropriate to assume that the number of new individuals observed on successive days will be identically distributed. Given observations on the number of unique users participating in an initial period, we present a simple but novel Bayesian method for predicting the number of additional individuals who will participate during a subsequent period. We illustrate the performance of the method in predicting sample size in online experimentation. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/richardson22a.html
  PDF: https://proceedings.mlr.press/v151/richardson22a/richardson22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-richardson22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Thomas S.
    family: Richardson
  - given: Yu
    family: Liu
  - given: James
    family: Mcqueen
  - given: Doug
    family: Hains
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1775-1785
  id: richardson22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1775
  lastpage: 1785
  published: 2022-05-03 00:00:00 +0000
- title: ' Parallel MCMC Without Embarrassing Failures '
  abstract: ' Embarrassingly parallel Markov Chain Monte Carlo (MCMC) exploits parallel computing to scale Bayesian inference to large datasets by using a two-step approach. First, MCMC is run in parallel on (sub)posteriors defined on data partitions. Then, a server combines local results. While efficient, this framework is very sensitive to the quality of subposterior sampling. Common sampling problems such as missing modes or misrepresentation of low-density regions are amplified – instead of being corrected – in the combination phase, leading to catastrophic failures. In this work, we propose a novel combination strategy to mitigate this issue. Our strategy, Parallel Active Inference (PAI), leverages Gaussian Process (GP) surrogate modeling and active learning. After fitting GPs to subposteriors, PAI (i) shares information between GP surrogates to cover missing modes; and (ii) uses active sampling to individually refine subposterior approximations. We validate PAI in challenging benchmarks, including heavy-tailed and multi-modal posteriors and a real-world application to computational neuroscience. Empirical results show that PAI succeeds where previous methods catastrophically fail, with a small communication overhead. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/de-souza22a.html
  PDF: https://proceedings.mlr.press/v151/de-souza22a/de-souza22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-de-souza22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Daniel A.
    family: De Souza
  - given: Diego
    family: Mesquita
  - given: Samuel
    family: Kaski
  - given: Luigi
    family: Acerbi
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1786-1804
  id: de-souza22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1786
  lastpage: 1804
  published: 2022-05-03 00:00:00 +0000
- title: ' Optimal Dynamic Regret in Proper Online Learning with Strongly Convex Losses and Beyond '
  abstract: ' We study the framework of <em>universal dynamic regret</em> minimization with <em>strongly convex</em> losses. We answer an open problem in Baby and Wang 2021 by showing that in a <em>proper learning</em> setup, Strongly Adaptive algorithms can achieve the near optimal dynamic regret of $\tilde O(d^{1/3} n^{1/3}\text{TV}[u_{1:n}]^{2/3} \vee d)$ against any comparator sequence $u_1,\ldots,u_n$ <em>simultaneously</em>, where $n$ is the time horizon and $\text{TV}[u_{1:n}]$ is the Total Variation of comparator. These results are facilitated by exploiting a number of <em>new</em> structures imposed by the KKT conditions that were not considered in Baby and Wang 2021 which also lead to other improvements over their results such as: (a) handling non-smooth losses and (b) improving the dimension dependence on regret. Further, we also derive near optimal dynamic regret rates for the special case of proper online learning with exp-concave losses and an $L_\infty$ constrained decision set. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/baby22a.html
  PDF: https://proceedings.mlr.press/v151/baby22a/baby22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-baby22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Dheeraj
    family: Baby
  - given: Yu-Xiang
    family: Wang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1805-1845
  id: baby22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1805
  lastpage: 1845
  published: 2022-05-03 00:00:00 +0000
- title: ' Counterfactual Explanation Trees: Transparent and Consistent Actionable Recourse with Decision Trees '
  abstract: ' Counterfactual Explanation (CE) is a post-hoc explanation method that provides a perturbation for altering the prediction result of a classifier. An individual can interpret the perturbation as an "action" to obtain the desired decision results. Existing CE methods focus on providing an action, which is optimized for a given single instance. However, these CE methods do not address the case where we have to assign actions to multiple instances simultaneously. In such a case, we need a framework of CE that assigns actions to multiple instances in a transparent and consistent way. In this study, we propose Counterfactual Explanation Tree (CET) that assigns effective actions with decision trees. Due to the properties of decision trees, our CET has two advantages: (1) Transparency: the reasons for assigning actions are summarized in an interpretable structure, and (2) Consistency: these reasons do not conflict with each other. We learn a CET in two steps: (i) compute one effective action for multiple instances and (ii) partition the instances to balance the effectiveness and interpretability. Numerical experiments and user studies demonstrated the efficacy of our CET in comparison with existing methods. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/kanamori22a.html
  PDF: https://proceedings.mlr.press/v151/kanamori22a/kanamori22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-kanamori22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Kentaro
    family: Kanamori
  - given: Takuya
    family: Takagi
  - given: Ken
    family: Kobayashi
  - given: Yuichi
    family: Ike
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1846-1870
  id: kanamori22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1846
  lastpage: 1870
  published: 2022-05-03 00:00:00 +0000
- title: ' Spectral risk-based learning using unbounded losses '
  abstract: ' In this work, we consider the setting of learning problems under a wide class of spectral risk (or "L-risk") functions, where a Lipschitz-continuous spectral density is used to flexibly assign weight to extreme loss values. We obtain excess risk guarantees for a derivative-free learning procedure under unbounded heavy-tailed loss distributions, and propose a computationally efficient implementation which empirically outperforms traditional risk minimizers in terms of balancing spectral risk and misclassification error. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/holland22a.html
  PDF: https://proceedings.mlr.press/v151/holland22a/holland22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-holland22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Matthew J.
    family: Holland
  - given: El
    family: Mehdi Haress
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1871-1886
  id: holland22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1871
  lastpage: 1886
  published: 2022-05-03 00:00:00 +0000
- title: ' A Dual Approach to Constrained Markov Decision Processes with Entropy Regularization '
  abstract: ' We study entropy-regularized constrained Markov decision processes (CMDPs) under the soft-max parameterization, in which an agent aims to maximize the entropy-regularized value function while satisfying constraints on the expected total utility. By leveraging the entropy regularization, our theoretical analysis shows that its Lagrangian dual function is smooth and the Lagrangian duality gap can be decomposed into the primal optimality gap and the constraint violation. Furthermore, we propose an accelerated dual-descent method for entropy-regularized CMDPs. We prove that our method achieves the global convergence rate $\widetilde{\mathcal{O}}(1/T)$ for both the optimality gap and the constraint violation for entropy-regularized CMDPs. A discussion about a linear convergence rate for CMDPs with a single constraint is also provided. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ying22a.html
  PDF: https://proceedings.mlr.press/v151/ying22a/ying22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ying22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Donghao
    family: Ying
  - given: Yuhao
    family: Ding
  - given: Javad
    family: Lavaei
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1887-1909
  id: ying22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1887
  lastpage: 1909
  published: 2022-05-03 00:00:00 +0000
- title: ' On the Global Optimum Convergence of Momentum-based Policy Gradient '
  abstract: ' Policy gradient (PG) methods are popular and efficient for large-scale reinforcement learning due to their relative stability and incremental nature. In recent years, the empirical success of PG methods has led to the development of a theoretical foundation for these methods. In this work, we generalize this line of research by establishing the first set of global convergence results of stochastic PG methods with momentum terms, which have been demonstrated to be efficient recipes for improving PG methods. We study both the soft-max and the Fisher-non-degenerate policy parametrizations, and show that adding a momentum term improves the global optimality sample complexities of vanilla PG methods by $\tilde{\mathcal{O}}(\epsilon^{-1.5})$ and $\tilde{\mathcal{O}}(\epsilon^{-1})$, respectively, where $\epsilon>0$ is the target tolerance. Our results for the generic Fisher-non-degenerate policy parametrizations also provide the first single-loop and finite-batch PG algorithm achieving an $\tilde{O}(\epsilon^{-3})$ global optimality sample complexity. Finally, as a by-product, our analyses provide general tools for deriving the global convergence rates of stochastic PG methods, which can be readily applied and extended to other PG estimators under the two parametrizations. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ding22a.html
  PDF: https://proceedings.mlr.press/v151/ding22a/ding22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ding22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yuhao
    family: Ding
  - given: Junzi
    family: Zhang
  - given: Javad
    family: Lavaei
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1910-1934
  id: ding22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1910
  lastpage: 1934
  published: 2022-05-03 00:00:00 +0000
- title: ' Feature screening with kernel knockoffs '
  abstract: ' This article analyses three feature screening procedures: Kendall’s Tau and Spearman Rho (TR), Hilbert-Schmidt Independence Criterion (HSIC) and conditional Maximum Mean Discrepancy (cMMD), where the latter is a modified version of the standard MMD for categorical classification. These association measures are not based on any specific underlying model, such as the linear regression. We provide the conditions for which the sure independence screening (SIS) property is satisfied under a lower bound assumption on the minimum signal strength of the association measure. The SIS property for the HSIC and cMMD is established for given bounded and symmetric kernels. Within the high-dimensional setting, we propose a two-step approach to control the false discovery rate (FDR) using the knockoff filtering. The performances of the association measures are assessed through simulated and real data experiments and compared with existing competing screening methods. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/poignard22a.html
  PDF: https://proceedings.mlr.press/v151/poignard22a/poignard22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-poignard22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Benjamin
    family: Poignard
  - given: Peter J.
    family: Naylor
  - given: Héctor
    family: Climente-González
  - given: Makoto
    family: Yamada
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1935-1974
  id: poignard22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1935
  lastpage: 1974
  published: 2022-05-03 00:00:00 +0000
- title: ' Bias-Variance Decompositions for Margin Losses '
  abstract: ' We introduce a novel bias-variance decomposition for a range of strictly convex margin losses, including the logistic loss (minimized by the classic LogitBoost algorithm) as well as the squared margin loss and canonical boosting loss. Furthermore we show that, for all strictly convex margin losses, the expected risk decomposes into the risk of a "central" model and a term quantifying variation in the functional margin with respect to variations in the training data. These decompositions provide a diagnostic tool for practitioners to understand model overfitting/underfitting, and have implications for additive ensemble models—for example, when our bias-variance decomposition holds, there is a corresponding "ambiguity" decomposition, which can be used to quantify model diversity. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/wood22a.html
  PDF: https://proceedings.mlr.press/v151/wood22a/wood22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-wood22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Danny
    family: Wood
  - given: Tingting
    family: Mu
  - given: Gavin
    family: Brown
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 1975-2001
  id: wood22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 1975
  lastpage: 2001
  published: 2022-05-03 00:00:00 +0000
- title: ' Grassmann Stein Variational Gradient Descent '
  abstract: ' Stein variational gradient descent (SVGD) is a deterministic particle inference algorithm that provides an efficient alternative to Markov chain Monte Carlo. However, SVGD has been found to suffer from variance underestimation when the dimensionality of the target distribution is high. Recent developments have advocated projecting both the score function and the data onto real lines to sidestep this issue, although this can severely overestimate the epistemic (model) uncertainty. In this work, we propose Grassmann Stein variational gradient descent (GSVGD) as an alternative approach, which permits projections onto arbitrary dimensional subspaces. Compared with other variants of SVGD that rely on dimensionality reduction, GSVGD updates the projectors simultaneously for the score function and the data, and the optimal projectors are determined through a coupled Grassmann-valued diffusion process which explores favourable subspaces. Both our theoretical and experimental results suggest that GSVGD enjoys efficient state-space exploration in high-dimensional problems that have an intrinsic low-dimensional structure. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/liu22a.html
  PDF: https://proceedings.mlr.press/v151/liu22a/liu22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-liu22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Xing
    family: Liu
  - given: Harrison
    family: Zhu
  - given: Jean-Francois
    family: Ton
  - given: George
    family: Wynne
  - given: Andrew
    family: Duncan
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2002-2021
  id: liu22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2002
  lastpage: 2021
  published: 2022-05-03 00:00:00 +0000
- title: ' Nonstochastic Bandits and Experts with Arm-Dependent Delays '
  abstract: ' We study nonstochastic bandits and experts in a delayed setting where delays depend on both time and arms. While the setting in which delays only depend on time has been extensively studied, the arm-dependent delay setting better captures real-world applications at the cost of introducing new technical challenges. In the full information (experts) setting, we design an algorithm with a first-order regret bound that reveals an interesting trade-off between delays and losses. We prove a similar first-order regret bound also for the bandit setting, when the learner is allowed to observe how many losses are missing. Our bounds are the first in the delayed setting that only depend on the losses and delays of the best arm. In the bandit setting, when no information other than the losses is observed, we still manage to prove a regret bound for bandits through a modification to the algorithm of Zimmert and Seldin (2020). Our analyses hinge on a novel bound on the drift, measuring how much better an algorithm can perform when given a look-ahead of one round. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/van-der-hoeven22a.html
  PDF: https://proceedings.mlr.press/v151/van-der-hoeven22a/van-der-hoeven22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-van-der-hoeven22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Dirk
    family: Van Der Hoeven
  - given: Nicolò
    family: Cesa-Bianchi
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2022-2044
  id: van-der-hoeven22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2022
  lastpage: 2044
  published: 2022-05-03 00:00:00 +0000
- title: ' Improved analysis of randomized SVD for top-eigenvector approximation '
  abstract: ' Computing the top eigenvectors of a matrix is a problem of fundamental interest to various fields. While the majority of the literature has focused on analyzing the reconstruction error of low-rank matrices associated with the retrieved eigenvectors, in many applications one is interested in finding one vector with high Rayleigh quotient. In this paper we study the problem of approximating the top-eigenvector. Given a symmetric matrix $\mathbf{A}$ with largest eigenvalue $\lambda_1$, our goal is to find a vector $\hat{\mathbf{u}}$ that approximates the leading eigenvector $\mathbf{u}_1$ with high accuracy, as measured by the ratio $R(\hat{\mathbf{u}})=\lambda_1^{-1}{\hat{\mathbf{u}}^T\mathbf{A}\hat{\mathbf{u}}}/{\hat{\mathbf{u}}^T\hat{\mathbf{u}}}$. We present a novel analysis of the randomized SVD algorithm of \citet{halko2011finding} and derive tight bounds in many cases of interest. Notably, this is the first work that provides non-trivial bounds of $R(\hat{\mathbf{u}})$ for randomized SVD with any number of iterations. Our theoretical analysis is complemented with a thorough experimental study that confirms the efficiency and accuracy of the method. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/tzeng22a.html
  PDF: https://proceedings.mlr.press/v151/tzeng22a/tzeng22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-tzeng22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ruo-Chun
    family: Tzeng
  - given: Po-An
    family: Wang
  - given: Florian
    family: Adriaens
  - given: Aristides
    family: Gionis
  - given: Chi-Jen
    family: Lu
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2045-2072
  id: tzeng22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2045
  lastpage: 2072
  published: 2022-05-03 00:00:00 +0000
- title: ' p-Generalized Probit Regression and Scalable Maximum Likelihood Estimation via Sketching and Coresets '
  abstract: ' We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses. It extends the standard probit model by replacing its link function, the standard normal cdf, by a $p$-generalized normal distribution for $p\in[1, \infty)$. The $p$-generalized normal distributions (Subbotin, 1923) are of special interest in statistical modeling because they fit much more flexibly to data. Their tail behavior can be controlled by choice of the parameter $p$, which influences the model’s sensitivity to outliers. Special cases include the Laplace, the Gaussian, and the uniform distributions. We further show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+\varepsilon)$ on large data by combining sketching techniques with importance subsampling to obtain a small data summary called coreset. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/munteanu22a.html
  PDF: https://proceedings.mlr.press/v151/munteanu22a/munteanu22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-munteanu22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Alexander
    family: Munteanu
  - given: Simon
    family: Omlor
  - given: Christian
    family: Peters
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2073-2100
  id: munteanu22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2073
  lastpage: 2100
  published: 2022-05-03 00:00:00 +0000
- title: ' Non-stationary Online Learning with Memory and Non-stochastic Control '
  abstract: ' We study the problem of Online Convex Optimization (OCO) with memory, which allows loss functions to depend on past decisions and thus captures temporal effects of learning problems. In this paper, we introduce dynamic policy regret as the performance measure to design algorithms robust to non-stationary environments, which competes algorithms’ decisions with a sequence of changing comparators. We propose a novel algorithm for OCO with memory that provably enjoys an optimal dynamic policy regret. The key technical challenge is how to control the switching cost, the cumulative movements of player’s decisions, which is neatly addressed by a novel decomposition of dynamic policy regret and a careful design of meta-learner and base-learner that explicitly regularizes the switching cost. The results are further applied to tackle non-stationarity in online non-stochastic control [Agarwal et al., 2019], i.e., controlling a linear dynamical system with adversarial disturbance and convex cost functions. We derive a novel gradient-based controller with dynamic policy regret guarantees, which is the first controller provably competitive to a sequence of changing policies for online non-stochastic control. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/zhao22a.html
  PDF: https://proceedings.mlr.press/v151/zhao22a/zhao22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-zhao22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Peng
    family: Zhao
  - given: Yu-Xiang
    family: Wang
  - given: Zhi-Hua
    family: Zhou
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2101-2133
  id: zhao22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2101
  lastpage: 2133
  published: 2022-05-03 00:00:00 +0000
- title: ' Dimensionality Reduction and Prioritized Exploration for Policy Search '
  abstract: ' Black-box policy optimization is a class of reinforcement learning algorithms that explores and updates the policies at the parameter level. This class of algorithms is widely applied in robotics with movement primitives or non-differentiable policies. Furthermore, these approaches are particularly relevant where exploration at the action level could cause actuator damage or other safety issues. However, Black-box optimization does not scale well with the increasing dimensionality of the policy, leading to high demand for samples, which are expensive to obtain in real-world systems. In many practical applications, policy parameters do not contribute equally to the return. Identifying the most relevant parameters allows to narrow down the exploration and speed up the learning. Furthermore, updating only the effective parameters requires fewer samples, improving the scalability of the method. We present a novel method to prioritize the exploration of effective parameters and cope with full covariance matrix updates. Our algorithm learns faster than recent approaches and requires fewer samples to achieve state-of-the-art results. To select the effective parameters, we consider both the Pearson correlation coefficient and the Mutual Information. We showcase the capabilities of our approach on the Relative Entropy Policy Search algorithm in several simulated environments, including robotics simulations. Code is available at https://git.ias.informatik.tu-darmstadt.de/ias_code/aistats2022/dr-creps. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/memmel22a.html
  PDF: https://proceedings.mlr.press/v151/memmel22a/memmel22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-memmel22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Marius
    family: Memmel
  - given: Puze
    family: Liu
  - given: Davide
    family: Tateo
  - given: Jan
    family: Peters
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2134-2157
  id: memmel22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2134
  lastpage: 2157
  published: 2022-05-03 00:00:00 +0000
- title: ' The Curse Revisited: When are Distances Informative for the Ground Truth in Noisy High-Dimensional Data? '
  abstract: ' Distances between data points are widely used in machine learning applications. Yet, when corrupted by noise, these distances—and thus the models based upon them—may lose their usefulness in high dimensions. Indeed, the small marginal effects of the noise may then accumulate quickly, shifting empirical closest and furthest neighbors away from the ground truth. In this paper, we exactly characterize such effects in noisy high-dimensional data using an asymptotic probabilistic expression. Previously, it has been argued that neighborhood queries become meaningless and unstable when distance concentration occurs, which means that there is a poor relative discrimination between the furthest and closest neighbors in the data. However, we conclude that this is not necessarily the case when we decompose the data in a ground truth—which we aim to recover—and noise component. More specifically, we derive that under particular conditions, empirical neighborhood relations affected by noise are still likely to be truthful even when distance concentration occurs. We also include thorough empirical verification of our results, as well as interesting experiments in which our derived ‘phase shift’ where neighbors become random or not turns out to be identical to the phase shift where common dimensionality reduction methods perform poorly or well for recovering low-dimensional reconstructions of high-dimensional data with dense noise. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/vandaele22a.html
  PDF: https://proceedings.mlr.press/v151/vandaele22a/vandaele22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-vandaele22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Robin
    family: Vandaele
  - given: Bo
    family: Kang
  - given: Tijl
    family: De Bie
  - given: Yvan
    family: Saeys
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2158-2172
  id: vandaele22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2158
  lastpage: 2172
  published: 2022-05-03 00:00:00 +0000
- title: ' Improving Attribution Methods by Learning Submodular Functions '
  abstract: ' This work explores the novel idea of learning a submodular scoring function to improve the specificity/selectivity of existing feature attribution methods. Submodular scores are natural for attribution as they are known to accurately model the principle of diminishing returns. A new formulation for learning a deep submodular set function that is consistent with the real-valued attribution maps obtained by existing attribution methods is proposed. The final attribution value of a feature is then defined as the marginal gain in the induced submodular score of the feature in the context of other highly attributed features, thus decreasing the attribution of redundant yet discriminatory features. Experiments on multiple datasets illustrate that the proposed attribution method achieves higher specificity along with good discriminative power. The implementation of our method is publicly available at https://github.com/Piyushi-0/SEA-NN. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/manupriya22a.html
  PDF: https://proceedings.mlr.press/v151/manupriya22a/manupriya22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-manupriya22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Piyushi
    family: Manupriya
  - given: Tarun
    family: Ram Menta
  - given: Sakethanath N.
    family: Jagarlapudi
  - given: Vineeth N.
    family: Balasubramanian
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2173-2190
  id: manupriya22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2173
  lastpage: 2190
  published: 2022-05-03 00:00:00 +0000
- title: ' Conditional Gradients for the Approximately Vanishing Ideal '
  abstract: ' The vanishing ideal of a set of points X is the set of polynomials that evaluate to 0 over all points x in X and admits an efficient representation by a finite set of polynomials called generators. To accommodate the noise in the data set, we introduce the Conditional Gradients Approximately Vanishing Ideal algorithm (CGAVI) for the construction of the set of generators of the approximately vanishing ideal. The constructed set of generators captures polynomial structures in data and gives rise to a feature map that can, for example, be used in combination with a linear classifier for supervised learning. In CGAVI, we construct the set of generators by solving specific instances of (constrained) convex optimization problems with the Pairwise Frank-Wolfe algorithm (PFW). Among other things, the constructed generators inherit the LASSO generalization bound and not only vanish on the training but also on out-sample data. Moreover, CGAVI admits a compact representation of the approximately vanishing ideal by constructing few generators with sparse coefficient vectors. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/wirth22a.html
  PDF: https://proceedings.mlr.press/v151/wirth22a/wirth22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-wirth22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Elias S.
    family: Wirth
  - given: Sebastian
    family: Pokutta
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2191-2209
  id: wirth22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2191
  lastpage: 2209
  published: 2022-05-03 00:00:00 +0000
- title: ' Efficient Algorithms for Extreme Bandits '
  abstract: ' In this paper, we contribute to the Extreme Bandits problem, a variant of Multi-Armed Bandits in which the learner seeks to collect the largest possible reward. We first study the concentration of the maximum of i.i.d random variables under mild assumptions on the tail of the rewards distributions. This analysis motivates the introduction of Quantile of Maxima (QoMax). The properties of QoMax are sufficient to build an Explore-Then-Commit (ETC) strategy, QoMax-ETC, achieving strong asymptotic guarantees despite its simplicity. We then propose and analyze a more adaptive, anytime algorithm, QoMax-SDA, which combines QoMax with a subsampling method recently introduced by Baudry et al. (2021). Both algorithms are more efficient than existing approaches in two senses: (1) they lead to better empirical performance (2) they enjoy a significant reduction of the storage and computational cost. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/baudry22a.html
  PDF: https://proceedings.mlr.press/v151/baudry22a/baudry22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-baudry22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Dorian
    family: Baudry
  - given: Yoan
    family: Russac
  - given: Emilie
    family: Kaufmann
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2210-2248
  id: baudry22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2210
  lastpage: 2248
  published: 2022-05-03 00:00:00 +0000
- title: ' Rejection sampling from shape-constrained distributions in sublinear time '
  abstract: ' We consider the task of generating exact samples from a target distribution, known up to normalization, over a finite alphabet. The classical algorithm for this task is rejection sampling, and although it has been used in practice for decades, there is surprisingly little study of its fundamental limitations. In this work, we study the query complexity of rejection sampling in a minimax framework for various classes of discrete distributions. Our results provide new algorithms for sampling whose complexity scales sublinearly with the alphabet size. When applied to adversarial bandits, we show that a slight modification of the EXP3 algorithm reduces the per-iteration complexity from O(K) to O(log(K) log(K/\ensuremath{\delta})) with probability 1-\ensuremath{\delta}, where K is the number of arms. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/chewi22a.html
  PDF: https://proceedings.mlr.press/v151/chewi22a/chewi22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-chewi22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Sinho
    family: Chewi
  - given: Patrik R.
    family: Gerber
  - given: Chen
    family: Lu
  - given: Thibaut
    family: Le Gouic
  - given: Philippe
    family: Rigollet
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2249-2265
  id: chewi22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2249
  lastpage: 2265
  published: 2022-05-03 00:00:00 +0000
- title: ' Learning Inconsistent Preferences with Gaussian Processes '
  abstract: ' We revisit widely used preferential Gaussian processes (PGP) by Chu and Ghahramani [2005] and challenge their modelling assumption that imposes rankability of data items via latent utility function values. We propose a generalisation of PGP which can capture more expressive latent preferential structures in the data and thus be used to model inconsistent preferences, i.e. where transitivity is violated, or to discover clusters of comparable items via spectral decomposition of the learned preference functions. We also consider the properties of associated covariance kernel functions and its reproducing kernel Hilbert Space (RKHS), giving a simple construction that satisfies universality in the space of preference functions. Finally, we provide an extensive set of numerical experiments on simulated and real-world datasets showcasing the competitiveness of our proposed method with state-of-the-art. Our experimental findings support the conjecture that violations of rankability are ubiquitous in real-world preferential data. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/lun-chau22a.html
  PDF: https://proceedings.mlr.press/v151/lun-chau22a/lun-chau22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-lun-chau22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Siu
    family: Lun Chau
  - given: Javier
    family: Gonzalez
  - given: Dino
    family: Sejdinovic
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2266-2281
  id: lun-chau22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2266
  lastpage: 2281
  published: 2022-05-03 00:00:00 +0000
- title: ' Bayesian Classifier Fusion with an Explicit Model of Correlation '
  abstract: ' Combining the outputs of multiple classifiers or experts into a single probabilistic classification is a fundamental task in machine learning with broad applications from classifier fusion to expert opinion pooling. Here we present a hierarchical Bayesian model of probabilistic classifier fusion based on a new correlated Dirichlet distribution. This distribution explicitly models positive correlations between marginally Dirichlet-distributed random vectors thereby allowing explicit modeling of correlations between base classifiers or experts. The proposed model naturally accommodates the classic Independent Opinion Pool and other independent fusion algorithms as special cases. It is evaluated by uncertainty reduction and correctness of fusion on synthetic and real-world data sets. We show that a change in performance of the fused classifier due to uncertainty reduction can be Bayes optimal even for highly correlated base classifiers. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/trick22a.html
  PDF: https://proceedings.mlr.press/v151/trick22a/trick22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-trick22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Susanne
    family: Trick
  - given: Constantin
    family: Rothkopf
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2282-2310
  id: trick22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2282
  lastpage: 2310
  published: 2022-05-03 00:00:00 +0000
- title: ' Conditionally Gaussian PAC-Bayes '
  abstract: ' Recent studies have empirically investigated different methods to train stochastic neural networks on a classification task by optimising a PAC-Bayesian bound via stochastic gradient descent. Most of these procedures need to replace the misclassification error with a surrogate loss, leading to a mismatch between the optimisation objective and the actual generalisation bound. The present paper proposes a novel training algorithm that optimises the PAC-Bayesian bound, without relying on any surrogate loss. Empirical results show that this approach outperforms currently available PAC-Bayesian training methods. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/clerico22a.html
  PDF: https://proceedings.mlr.press/v151/clerico22a/clerico22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-clerico22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Eugenio
    family: Clerico
  - given: George
    family: Deligiannidis
  - given: Arnaud
    family: Doucet
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2311-2329
  id: clerico22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2311
  lastpage: 2329
  published: 2022-05-03 00:00:00 +0000
- title: ' Leveraging Time Irreversibility with Order-Contrastive Pre-training '
  abstract: ' Label-scarce, high-dimensional domains such as healthcare present a challenge for modern machine learning techniques. To overcome the difficulties posed by a lack of labeled data, we explore an "order-contrastive" method for self-supervised pre-training on longitudinal data. We sample pairs of time segments, switch the order for half of them, and train a model to predict whether a given pair is in the correct order. Intuitively, the ordering task allows the model to attend to the least time-reversible features (for example, features that indicate progression of a chronic disease). The same features are often useful for downstream tasks of interest. To quantify this, we study a simple theoretical setting where we prove a finite-sample guarantee for the downstream error of a representation learned with order-contrastive pre-training. Empirically, in synthetic and longitudinal healthcare settings, we demonstrate the effectiveness of order-contrastive pre-training in the small-data regime over supervised learning and other self-supervised pre-training baselines. Our results indicate that pre-training methods designed for particular classes of distributions and downstream tasks can improve the performance of self-supervised learning. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/agrawal22a.html
  PDF: https://proceedings.mlr.press/v151/agrawal22a/agrawal22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-agrawal22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Monica N.
    family: Agrawal
  - given: Hunter
    family: Lang
  - given: Michael
    family: Offin
  - given: Lior
    family: Gazit
  - given: David
    family: Sontag
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2330-2353
  id: agrawal22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2330
  lastpage: 2353
  published: 2022-05-03 00:00:00 +0000
- title: ' Moment Matching Deep Contrastive Latent Variable Models '
  abstract: ' In the contrastive analysis (CA) setting, machine learning practitioners are specifically interested in discovering patterns that are enriched in a target dataset as compared to a background dataset generated from sources of variation irrelevant to the task at hand. For example, a biomedical data analyst may seek to understand variations in genomic data only present among patients with a given disease as opposed to those also present in healthy control subjects. Such scenarios have motivated the development of contrastive latent variable models to isolate variations unique to these target datasets from those shared across the target and background datasets, with current state of the art models based on the variational autoencoder (VAE) framework. However, previously proposed models do not explicitly enforce the constraints on latent variables underlying CA, potentially leading to the undesirable leakage of information between the two sets of latent variables. Here we propose the moment matching contrastive VAE (MM-cVAE), a reformulation of the VAE for CA that uses the maximum mean discrepancy to explicitly enforce two crucial latent variable constraints underlying CA. On three challenging CA tasks we find that our method outperforms the previous state-of-the-art both qualitatively and on a set of quantitative metrics. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/weinberger22a.html
  PDF: https://proceedings.mlr.press/v151/weinberger22a/weinberger22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-weinberger22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ethan
    family: Weinberger
  - given: Nicasia
    family: Beebe-Wang
  - given: Su-In
    family: Lee
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2354-2371
  id: weinberger22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2354
  lastpage: 2371
  published: 2022-05-03 00:00:00 +0000
- title: ' Unifying Importance Based Regularisation Methods for Continual Learning '
  abstract: ' Continual Learning addresses the challenge of learning a number of different tasks sequentially. The goal of maintaining knowledge of earlier tasks without re-accessing them starkly conflicts with standard SGD training for artificial neural networks. An influential method to tackle this problem without storing old data are so-called regularisation approaches. They measure the importance of each parameter for solving a given task and subsequently protect important parameters from large changes. In the literature, three ways to measure parameter importance have been put forward and they have inspired a large body of follow-up work. Here, we present strong theoretical and empirical evidence that these three methods, Elastic Weight Consolidation (EWC), Synaptic Intelligence (SI) and Memory Aware Synapses (MAS), are surprisingly similar and are all linked to the same theoretical quantity. Concretely, we show that, despite stemming from very different motivations, both SI and MAS approximate the square root of the Fisher Information, with the Fisher being the theoretically justified basis of EWC. Moreover, we show that for SI the relation to the Fisher – and in fact its performance – is due to a previously unknown bias. On top of uncovering unknown similarities and unifying regularisation approaches, we also demonstrate that our insights enable practical performance improvements for large batch training. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/benzing22a.html
  PDF: https://proceedings.mlr.press/v151/benzing22a/benzing22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-benzing22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Frederik
    family: Benzing
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2372-2396
  id: benzing22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2372
  lastpage: 2396
  published: 2022-05-03 00:00:00 +0000
- title: ' Differentially Private Histograms under Continual Observation: Streaming Selection into the Unknown '
  abstract: ' We generalize the continuous observation privacy setting from Dwork et al. and Chan et al. by allowing each event in a stream to be a subset of some (possibly unknown) universe of items. We design differentially private (DP) algorithms for histograms in several settings, including top-k selection, with privacy loss that scales with polylog(T), where T is the maximum length of the input stream. We present a meta-algorithm that can use existing one-shot top-k private algorithms as a subroutine to continuously release DP histograms from a stream. Further, we present more practical DP algorithms for two settings: 1) continuously releasing the top-k counts from a histogram over a known domain when an event can consist of an arbitrary number of items, and 2) continuously releasing histograms over an unknown domain when an event has a limited number of items. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/rivera-cardoso22a.html
  PDF: https://proceedings.mlr.press/v151/rivera-cardoso22a/rivera-cardoso22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-rivera-cardoso22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Adrian
    family: Rivera Cardoso
  - given: Ryan
    family: Rogers
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2397-2419
  id: rivera-cardoso22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2397
  lastpage: 2419
  published: 2022-05-03 00:00:00 +0000
- title: ' Confident Least Square Value Iteration with Local Access to a Simulator '
  abstract: ' Learning with simulators is ubiquitous in mod-ern reinforcement learning (RL). The simulatorcan either correspond to a simplified version ofthe real environment (such as a physics simulation of a robot arm) or to the environment itself (such as in games like Atari and Go). Among algorithms that are provably sample-efficient in this setting, most make the unrealistic assumption that all possible environment states are known before learning begins, or perform global optimistic planning which is computationally inefficient. In this work, we focus on simulation-based RL under a more realistic local access protocol, where the state space is unknown and the simulator can only be queried at states that have previously been observed (initial states and those returned by previous queries). We propose an algorithm named CONFIDENT-LSVI based on the template of least-square value iteration. CONFIDENT-LSVI incrementally builds a coreset of important states and uses the simulator to revisit them. Assuming that the linear function class has low approximation error under the Bell-man optimality operator (a.k.a. low inherent Bell-man error), we bound the algorithm performance in terms of this error, and show that it is query-and computationally-efficient. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/hao22a.html
  PDF: https://proceedings.mlr.press/v151/hao22a/hao22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-hao22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Botao
    family: Hao
  - given: Nevena
    family: Lazic
  - given: Dong
    family: Yin
  - given: Yasin
    family: Abbasi-Yadkori
  - given: Csaba
    family: Szepesvari
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2420-2435
  id: hao22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2420
  lastpage: 2435
  published: 2022-05-03 00:00:00 +0000
- title: ' Safe Optimal Design with Applications in Off-Policy Learning '
  abstract: ' Motivated by practical needs in online experimentation and off-policy learning, we study the problem of safe optimal design, where we develop a data logging policy that efficiently explores while achieving competitive rewards with a baseline production policy. We first show, perhaps surprisingly, that a common practice of mixing the production policy with uniform exploration, despite being safe, is sub-optimal in maximizing information gain. Then we propose a safe optimal logging policy for the case when no side information about the actions’ expected rewards is available. We improve upon this design by considering side information and also extend both approaches to a large number of actions with a linear reward model. We analyze how our data logging policies impact errors in off-policy learning. Finally, we empirically validate the benefit of our designs by conducting extensive experiments. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/zhu22a.html
  PDF: https://proceedings.mlr.press/v151/zhu22a/zhu22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-zhu22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ruihao
    family: Zhu
  - given: Branislav
    family: Kveton
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2436-2447
  id: zhu22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2436
  lastpage: 2447
  published: 2022-05-03 00:00:00 +0000
- title: ' Accurate Shapley Values for explaining tree-based models '
  abstract: ' Although Shapley Values (SV) are widely used in explainable AI, they can be poorly understood and estimated, implying that their analysis may lead to spurious inferences and explanations. As a starting point, we remind an invariance principle for SV and derive the correct approach for computing the SV of categorical variables that are particularly sensitive to the encoding used. In the case of tree-based models, we introduce two estimators of Shapley Values that exploit the tree structure efficiently and are more accurate than state-of-the-art methods. Simulations and comparisons are performed with state-of-the-art algorithms and show the practical gain of our approach. Finally, we discuss the ability of SV to provide reliable local explanations. We also provide a Python package that compute our estimators at https://github.com/salimamoukou/acv00. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/amoukou22a.html
  PDF: https://proceedings.mlr.press/v151/amoukou22a/amoukou22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-amoukou22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Salim I.
    family: Amoukou
  - given: Tangi
    family: Salaün
  - given: Nicolas
    family: Brunel
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2448-2465
  id: amoukou22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2448
  lastpage: 2465
  published: 2022-05-03 00:00:00 +0000
- title: ' A Single-Timescale Method for Stochastic Bilevel Optimization '
  abstract: ' Stochastic bilevel optimization generalizes the classic stochastic optimization from the minimization of a single objective to the minimization of an objective function that depends on the solution of another optimization problem. Recently, bilevel optimization is regaining popularity in emerging machine learning applications such as hyper-parameter optimization and model-agnostic meta learning. To solve this class of optimization problems, existing methods require either double-loop or two-timescale updates, which are sometimes less efficient. This paper develops a new optimization method for a class of stochastic bilevel problems that we term Single-Timescale stochAstic BiLevEl optimization (STABLE) method. STABLE runs in a single loop fashion, and uses a single-timescale update with a fixed batch size. To achieve an $\epsilon$-stationary point of the bilevel problem, STABLE requires ${\cal O}(\epsilon^{-2})$ samples in total; and to achieve an $\epsilon$-optimal solution in the strongly convex case, STABLE requires ${\cal O}(\epsilon^{-1})$ samples. To the best of our knowledge, when STABLE was proposed, it is the first bilevel optimization algorithm achieving the same order of sample complexity as SGD for single-level stochastic optimization. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/chen22e.html
  PDF: https://proceedings.mlr.press/v151/chen22e/chen22e.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-chen22e.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Tianyi
    family: Chen
  - given: Yuejiao
    family: Sun
  - given: Quan
    family: Xiao
  - given: Wotao
    family: Yin
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2466-2488
  id: chen22e
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2466
  lastpage: 2488
  published: 2022-05-03 00:00:00 +0000
- title: ' Strategic ranking '
  abstract: ' Strategic classification studies the design of a classifier robust to the manipulation of input by strategic individuals. However, the existing literature does not consider the effect of competition among individuals as induced by the algorithm design. Motivated by constrained allocation settings such as college admissions, we introduce strategic ranking, in which the (designed) individual reward depends on an applicant’s post-effort rank in a measurement of interest. Our results illustrate how competition among applicants affects the resulting equilibria and model insights. We analyze how various ranking reward designs, belonging to a family of step functions, trade off applicant, school, and societal utility, as well as how ranking design counters inequities arising from disparate access to resources. In particular, we find that randomization in the reward design can mitigate two measures of disparate impact, welfare gap and access. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/liu22b.html
  PDF: https://proceedings.mlr.press/v151/liu22b/liu22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-liu22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Lydia T.
    family: Liu
  - given: Nikhil
    family: Garg
  - given: Christian
    family: Borgs
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2489-2518
  id: liu22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2489
  lastpage: 2518
  published: 2022-05-03 00:00:00 +0000
- title: ' Encrypted Linear Contextual Bandit '
  abstract: ' Contextual bandit is a general framework for online learning in sequential decision-making problems that has found application in a wide range of domains, including recommendation systems, online advertising, and clinical trials. A critical aspect of bandit methods is that they require to observe the contexts –i.e., individual or group-level data– and rewards in order to solve the sequential problem. The large deployment in industrial applications has increased interest in methods that preserve the users’ privacy. In this paper, we introduce a privacy-preserving bandit framework based on homomorphic encryption which allows computations using encrypted data. The algorithm only observes encrypted information (contexts and rewards) and has no ability to decrypt it. Leveraging the properties of homomorphic encryption, we show that despite the complexity of the setting, it is possible to solve linear contextual bandits over encrypted data with a $\widetilde{O}(d\sqrt{T})$ regret bound in any linear contextual bandit problem, while keeping data encrypted. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/garcelon22a.html
  PDF: https://proceedings.mlr.press/v151/garcelon22a/garcelon22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-garcelon22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Evrard
    family: Garcelon
  - given: Matteo
    family: Pirotta
  - given: Vianney
    family: Perchet
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2519-2551
  id: garcelon22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2519
  lastpage: 2551
  published: 2022-05-03 00:00:00 +0000
- title: ' Density Ratio Estimation via Infinitesimal Classification '
  abstract: ' Density ratio estimation (DRE) is a fundamental machine learning technique for comparing two probability distributions. However, existing methods struggle in high-dimensional settings, as it is difficult to accurately compare probability distributions based on finite samples. In this work we propose DRE-$\infty$, a divide-and-conquer approach to reduce DRE to a series of easier subproblems. Inspired by Monte Carlo methods, we smoothly interpolate between the two distributions via an infinite continuum of intermediate bridge distributions. We then estimate the instantaneous rate of change of the bridge distributions indexed by time (the “time score”)—a quantity defined analogously to data (Stein) scores—with a novel time score matching objective. Crucially, the learned time scores can then be integrated to compute the desired density ratio. In addition, we show that traditional (Stein) scores can be used to obtain integration paths that connect regions of high density in both distributions, improving performance in practice. Empirically, we demonstrate that our approach performs well on downstream tasks such as mutual information estimation and energy-based modeling on complex, high-dimensional datasets. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/choi22a.html
  PDF: https://proceedings.mlr.press/v151/choi22a/choi22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-choi22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Kristy
    family: Choi
  - given: Chenlin
    family: Meng
  - given: Yang
    family: Song
  - given: Stefano
    family: Ermon
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2552-2573
  id: choi22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2552
  lastpage: 2573
  published: 2022-05-03 00:00:00 +0000
- title: ' AdaBlock: SGD with Practical Block Diagonal Matrix Adaptation for Deep Learning '
  abstract: ' We introduce AdaBlock, a class of adaptive gradient methods that extends popular approaches such as Adam by adopting the simple and natural idea of using block-diagonal matrix adaption to effectively utilize structural characteristics of deep learning architectures. Unlike other quadratic or block-diagonal approaches, AdaBlock has complete freedom to select block-diagonal groups, providing a wider trade-off applicable even to extremely high-dimensional problems. We provide convergence and generalization error bounds for AdaBlock, and study both theoretically and empirically the impact of the block size on the bounds and advantages over usual diagonal approaches. In addition, we propose a randomized layer-wise variant of Adablock to further reduce computations and memory footprint, and devise an efficient spectrum-clipping scheme for AdaBlock to benefit from Sgd’s superior generalization performance. Extensive experiments on several deep learning tasks demonstrate the benefits of block diagonal adaptation compared to adaptive diagonal methods, vanilla Sgd, as well as modified versions of full-matrix adaptation. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/yun22a.html
  PDF: https://proceedings.mlr.press/v151/yun22a/yun22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-yun22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jihun
    family: Yun
  - given: Aurelie
    family: Lozano
  - given: Eunho
    family: Yang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2574-2606
  id: yun22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2574
  lastpage: 2606
  published: 2022-05-03 00:00:00 +0000
- title: ' Scaling and Scalability: Provable Nonconvex Low-Rank Tensor Completion '
  abstract: ' Tensors, which provide a powerful and flexible model for representing multi-attribute data and multi-way interactions, play an indispensable role in modern data science across various fields in science and engineering. A fundamental task is tensor completion, which aims to faithfully recover the tensor from a small subset of its entries in a statistically and computationally efficient manner. Harnessing the low-rank structure of tensors in the Tucker decomposition, this paper develops a scaled gradient descent (ScaledGD) algorithm to directly recover the tensor factors with tailored spectral initializations, and shows that it provably converges at a linear rate independent of the condition number of the ground truth tensor for tensor completion as soon as the sample size is above the order of $n^{3/2}$ ignoring other parameter dependencies, where $n$ is the dimension of the tensor. To the best of our knowledge, ScaledGD is the first algorithm that achieves near-optimal statistical and computational complexities simultaneously for low-rank tensor completion with the Tucker decomposition. Our algorithm highlights the power of appropriate preconditioning in accelerating nonconvex statistical estimation, where the iteration-varying preconditioners promote desirable invariance properties of the trajectory with respect to the underlying symmetry in low-rank tensor factorization. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/tong22a.html
  PDF: https://proceedings.mlr.press/v151/tong22a/tong22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-tong22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Tian
    family: Tong
  - given: Cong
    family: Ma
  - given: Ashley
    family: Prater-Bennette
  - given: Erin
    family: Tripp
  - given: Yuejie
    family: Chi
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2607-2617
  id: tong22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2607
  lastpage: 2617
  published: 2022-05-03 00:00:00 +0000
- title: ' Pairwise Supervision Can Provably Elicit a Decision Boundary '
  abstract: ' Similarity learning is a general problem to elicit useful representations by predicting the relationship between a pair of patterns. This problem is related to various important preprocessing tasks such as metric learning, kernel learning, and contrastive learning. A classifier built upon the representations is expected to perform well in downstream classification; however, little theory has been given in literature so far and thereby the relationship between similarity and classification has remained elusive. Therefore, we tackle a fundamental question: can similarity information provably leads a model to perform well in downstream classification? In this paper, we reveal that a product-type formulation of similarity learning is strongly related to an objective of binary classification. We further show that these two different problems are explicitly connected by an excess risk bound. Consequently, our results elucidate that similarity learning is capable of solving binary classification by directly eliciting a decision boundary. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/bao22a.html
  PDF: https://proceedings.mlr.press/v151/bao22a/bao22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-bao22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Han
    family: Bao
  - given: Takuya
    family: Shimada
  - given: Liyuan
    family: Xu
  - given: Issei
    family: Sato
  - given: Masashi
    family: Sugiyama
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2618-2640
  id: bao22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2618
  lastpage: 2640
  published: 2022-05-03 00:00:00 +0000
- title: ' An Online Learning Approach to Interpolation and Extrapolation in Domain Generalization '
  abstract: ' A popular assumption for out-of-distribution generalization is that the training data comprises sub-datasets, each drawn from a distinct distribution; the goal is then to "interpolate" these distributions and "extrapolate" beyond them—this objective is broadly known as domain generalization. A common belief is that ERM can interpolate but not extrapolate and that the latter task is considerably more difficult, but these claims are vague and lack formal justification. In this work, we recast generalization over sub-groups as an online game between a player minimizing risk and an adversary presenting new test distributions. Under an existing notion of inter- and extrapolation based on reweighting of sub-group likelihoods, we rigorously demonstrate that extrapolation is computationally much harder than interpolation, though their statistical complexity is not significantly different. Furthermore, we show that ERM—possibly with added structured noise—is provably minimax-optimal for both tasks. Our framework presents a new avenue for the formal analysis of domain generalization algorithms which may be of independent interest. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/rosenfeld22a.html
  PDF: https://proceedings.mlr.press/v151/rosenfeld22a/rosenfeld22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-rosenfeld22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Elan
    family: Rosenfeld
  - given: Pradeep
    family: Ravikumar
  - given: Andrej
    family: Risteski
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2641-2657
  id: rosenfeld22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2641
  lastpage: 2657
  published: 2022-05-03 00:00:00 +0000
- title: ' On the Convergence Rate of Off-Policy Policy Optimization Methods with Density-Ratio Correction '
  abstract: ' In this paper, we study the convergence properties of off-policy policy optimization algorithms with state-action density ratio correction under function approximation setting, where the objective function is formulated as a max-max-min problem. We first clearly characterize the bias of the learning objective, and then present two strategies with finite-time convergence guarantees. In our first strategy, we propose an algorithm called P-SREDA with convergence rate $O(\epsilon^{-3})$, whose dependency on $\epsilon$ is optimal. Besides, in our second strategy, we design a new off-policy actor-critic style algorithm named O-SPIM. We prove that O-SPIM converges to a stationary point with total complexity $O(\epsilon^{-4})$, which matches the convergence rate of some recent actor-critic algorithms in the on-policy setting. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/huang22a.html
  PDF: https://proceedings.mlr.press/v151/huang22a/huang22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-huang22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jiawei
    family: Huang
  - given: Nan
    family: Jiang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2658-2705
  id: huang22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2658
  lastpage: 2705
  published: 2022-05-03 00:00:00 +0000
- title: ' Loss as the Inconsistency of a Probabilistic Dependency Graph: Choose Your Model, Not Your Loss Function '
  abstract: ' In a world blessed with a great diversity of loss functions, we argue that that choice between them is not a matter of taste or pragmatics, but of model. Probabilistic depencency graphs (PDGs) are probabilistic models that come equipped with a measure of "inconsistency". We prove that many standard loss functions arise as the inconsistency of a natural PDG describing the appropriate scenario, and use the same approach to justify a well-known connection between regularizers and priors. We also show that the PDG inconsistency captures a large class of statistical divergences, and detail benefits of thinking of them in this way, including an intuitive visual language for deriving inequalities between them. In variational inference, we find that the ELBO, a somewhat opaque objective for latent variable models, and variants of it arise for free out of uncontroversial modeling assumptions—as do simple graphical proofs of their corresponding bounds. Finally, we observe that inconsistency becomes the log partition function (free energy) in the setting where PDGs are factor graphs. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/richardson22b.html
  PDF: https://proceedings.mlr.press/v151/richardson22b/richardson22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-richardson22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Oliver E.
    family: Richardson
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2706-2735
  id: richardson22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2706
  lastpage: 2735
  published: 2022-05-03 00:00:00 +0000
- title: ' Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games '
  abstract: ' Policy-based methods with function approximation are widely used for solving two-player zero-sum games with large state and/or action spaces. However, it remains elusive how to obtain optimization and statistical guarantees for such algorithms. We present a new policy optimization algorithm with function approximation and prove that under standard regularity conditions on the Markov game and the function approximation class, our algorithm finds a near-optimal policy within a polynomial number of samples and iterations. To our knowledge, this is the first provably efficient policy optimization algorithm with function approximation that solves two-player zero-sum Markov games. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/zhao22b.html
  PDF: https://proceedings.mlr.press/v151/zhao22b/zhao22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-zhao22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yulai
    family: Zhao
  - given: Yuandong
    family: Tian
  - given: Jason
    family: Lee
  - given: Simon
    family: Du
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2736-2761
  id: zhao22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2736
  lastpage: 2761
  published: 2022-05-03 00:00:00 +0000
- title: ' One-bit Submission for Locally Private Quasi-MLE: Its Asymptotic Normality and Limitation '
  abstract: ' Local differential privacy (LDP) is an information-theoretic privacy definition suitable for statistical surveys that involve an untrusted data curator. An LDP version of quasi-maximum likelihood estimator (QMLE) has been developed, but the existing method to build LDP QMLE is difficult to implement for a large-scale survey system in the real world due to long waiting time, expensive communication cost, and the boundedness assumption of derivative of a log-likelihood function. We provided alternative LDP protocols without those issues, which are potentially much easily deployable to a large-scale survey. We also provided sufficient conditions for the consistency and asymptotic normality and limitations of our protocol. Our protocol is less burdensome for the users, and the theoretical guarantees cover more realistic cases than those for the existing method. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ono22a.html
  PDF: https://proceedings.mlr.press/v151/ono22a/ono22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ono22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Hajime
    family: Ono
  - given: Kazuhiro
    family: Minami
  - given: Hideitsu
    family: Hino
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2762-2783
  id: ono22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2762
  lastpage: 2783
  published: 2022-05-03 00:00:00 +0000
- title: ' Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably '
  abstract: ' We investigate the role of noise in optimization algorithms for learning over-parameterized models. Specifically, we consider the recovery of a rank one matrix $Y^*\in R^{d\times d}$ from a noisy observation $Y$ using an over-parameterization model. Specifically, we parameterize the rank one matrix $Y^*$ by $XX^\top$, where $X\in R^{d\times d}$. We then show that under mild conditions, the estimator, obtained by the randomly perturbed gradient descent algorithm using the square loss function, attains a mean square error of $O(\sigma^2/d)$, where $\sigma^2$ is the variance of the observational noise. In contrast, the estimator obtained by gradient descent without random perturbation only attains a mean square error of $O(\sigma^2)$. Our result partially justifies the implicit regularization effect of noise when learning over-parameterized models, and provides new understanding of training over-parameterized neural networks. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/liu22c.html
  PDF: https://proceedings.mlr.press/v151/liu22c/liu22c.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-liu22c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Tianyi
    family: Liu
  - given: Yan
    family: Li
  - given: Enlu
    family: Zhou
  - given: Tuo
    family: Zhao
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2784-2802
  id: liu22c
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2784
  lastpage: 2802
  published: 2022-05-03 00:00:00 +0000
- title: ' Robust Deep Learning from Crowds with Belief Propagation '
  abstract: ' Crowdsourcing systems enable us to collect large-scale dataset, but inherently suffer from noisy labels of low-paid workers. We address the inference and learning problems using such a crowdsourced dataset with noise. Due to the nature of sparsity in crowdsourcing, it is critical to exploit both probabilistic model to capture worker prior and neural network to extract task feature despite risks from wrong prior and overfitted feature in practice. We hence establish a neural-powered Bayesian framework, from which we devise deepMF and deepBP with different choice of variational approximation methods, mean field (MF) and belief propagation (BP), respectively. This provides a unified view of existing methods, which are special cases of deepMF with different priors. In addition, our empirical study suggests that deepBP is a new approach, which is more robust against wrong prior, feature overfitting and extreme workers thanks to the more sophisticated BP than MF. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/kim22a.html
  PDF: https://proceedings.mlr.press/v151/kim22a/kim22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-kim22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Hoyoung
    family: Kim
  - given: Seunghyuk
    family: Cho
  - given: Dongwoo
    family: Kim
  - given: Jungseul
    family: Ok
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2803-2822
  id: kim22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2803
  lastpage: 2822
  published: 2022-05-03 00:00:00 +0000
- title: ' Sampling from Arbitrary Functions via PSD Models '
  abstract: ' In many areas of applied statistics and machine learning, generating an arbitrary number of inde- pendent and identically distributed (i.i.d.) samples from a given distribution is a key task. When the distribution is known only through evaluations of the density, current methods either scale badly with the dimension or require very involved implemen- tations. Instead, we take a two-step approach by first modeling the probability distribution and then sampling from that model. We use the recently introduced class of positive semi-definite (PSD) models which have been shown to be e '
  volume: 151
  URL: https://proceedings.mlr.press/v151/marteau-ferey22a.html
  PDF: https://proceedings.mlr.press/v151/marteau-ferey22a/marteau-ferey22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-marteau-ferey22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ulysse
    family: Marteau-Ferey
  - given: Francis
    family: Bach
  - given: Alessandro
    family: Rudi
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2823-2861
  id: marteau-ferey22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2823
  lastpage: 2861
  published: 2022-05-03 00:00:00 +0000
- title: ' Uncertainty Quantification for Bayesian Optimization '
  abstract: ' Bayesian optimization is a class of global optimization techniques. In Bayesian optimization, the underlying objective function is modeled as a realization of a Gaussian process. Although the Gaussian process assumption implies a random distribution of the Bayesian optimization outputs, quantification of this uncertainty is rarely studied in the literature. In this work, we propose a novel approach to assess the output uncertainty of Bayesian optimization algorithms, which proceeds by constructing confidence regions of the maximum point (or value) of the objective function. These regions can be computed efficiently, and their confidence levels are guaranteed by the uniform error bounds for sequential Gaussian process regression newly developed in the present work. Our theory provides a unified uncertainty quantification framework for all existing sequential sampling policies and stopping criteria. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/tuo22a.html
  PDF: https://proceedings.mlr.press/v151/tuo22a/tuo22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-tuo22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Rui
    family: Tuo
  - given: Wenjia
    family: Wang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2862-2884
  id: tuo22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2862
  lastpage: 2884
  published: 2022-05-03 00:00:00 +0000
- title: ' Metalearning Linear Bandits by Prior Update '
  abstract: ' Fully Bayesian approaches to sequential decision-making assume that problem parameters are generated from a known prior. In practice, such information is often lacking. This problem is exacerbated in setups with partial information, where a misspecified prior may lead to poor exploration and performance. In this work we prove, in the context of stochastic linear bandits and Gaussian priors, that as long as the prior is sufficiently close to the true prior, the performance of the applied algorithm is close to that of the algorithm that uses the true prior. Furthermore, we address the task of learning the prior through metalearning, where a learner updates her estimate of the prior across multiple task instances in order to improve performance on future tasks. We provide an algorithm and regret bounds, demonstrate its effectiveness in comparison to an algorithm that knows the correct prior, and support our theoretical results empirically. Our theoretical results hold for a broad class of algorithms, including Thompson Sampling and Information Directed Sampling. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/peleg22a.html
  PDF: https://proceedings.mlr.press/v151/peleg22a/peleg22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-peleg22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Amit
    family: Peleg
  - given: Naama
    family: Pearl
  - given: Ron
    family: Meir
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2885-2926
  id: peleg22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2885
  lastpage: 2926
  published: 2022-05-03 00:00:00 +0000
- title: ' Fast Rank-1 NMF for Missing Data with KL Divergence '
  abstract: ' We propose a fast non-gradient-based method of rank-1 non-negative matrix factorization (NMF) for missing data, called A1GM, that minimizes the KL divergence from an input matrix to the reconstructed rank-1 matrix. Our method is based on our new finding of an analytical closed-formula of the best rank-1 non-negative multiple matrix factorization (NMMF), a variety of NMF. NMMF is known to exactly solve NMF for missing data if positions of missing values satisfy a certain condition, and A1GM transforms a given matrix so that the analytical solution to NMMF can be applied. We empirically show that A1GM is more efficient than a gradient method with competitive reconstruction errors. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ghalamkari22a.html
  PDF: https://proceedings.mlr.press/v151/ghalamkari22a/ghalamkari22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ghalamkari22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Kazu
    family: Ghalamkari
  - given: Mahito
    family: Sugiyama
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2927-2940
  id: ghalamkari22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2927
  lastpage: 2940
  published: 2022-05-03 00:00:00 +0000
- title: ' Randomized Stochastic Gradient Descent Ascent '
  abstract: ' An increasing number of machine learning problems, such as robust or adversarial variants of existing algorithms, require minimizing a loss function that is itself defined as a maximum. Carrying a loop of stochastic gradient ascent (SGA) steps on the (inner) maximization problem, followed by an SGD step on the (outer) minimization, is known as Epoch Stochastic Gradient Descent Ascent (ESGDA). While successful in practice, the theoretical analysis of ESGDA remains challenging, with no clear guidance on choices for the inner loop size nor on the interplay between inner/outer step sizes. We propose RSGDA (Randomized SGDA), a variant of ESGDA with stochastic loop size with a simpler theoretical analysis. RSGDA comes with the first (among SGDA algorithms) almost sure convergence rates when used on nonconvex min/strongly-concave max settings. RSGDA can be parameterized using optimal loop sizes that guarantee the best convergence rates known to hold for SGDA. We test RSGDA on toy and larger scale problems, using distributionally robust optimization and single-cell data matching using optimal transport as a testbed. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/sebbouh22a.html
  PDF: https://proceedings.mlr.press/v151/sebbouh22a/sebbouh22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-sebbouh22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Othmane
    family: Sebbouh
  - given: Marco
    family: Cuturi
  - given: Gabriel
    family: Peyré
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2941-2969
  id: sebbouh22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2941
  lastpage: 2969
  published: 2022-05-03 00:00:00 +0000
- title: ' Aligned Multi-Task Gaussian Process '
  abstract: ' Multi-task learning requires accurate identification of the correlations between tasks. In real-world time-series, tasks are rarely perfectly temporally aligned; traditional multi-task models do not account for this and subsequent errors in correlation estimation will result in poor predictive performance and uncertainty quantification. We introduce a method that automatically accounts for temporal misalignment in a unified generative model that improves predictive performance. Our method uses Gaussian processes (GPs) to model the correlations both within and between the tasks. Building on the previous work by Kazlauskaite et al. (2019), we include a separate monotonic warp of the input data to model temporal misalignment. In contrast to previous work, we formulate a lower bound that accounts for uncertainty in both the estimates of the warping process and the underlying functions. Also, our new take on a monotonic stochastic process, with efficient path-wise sampling for the warp functions, allows us to perform full Bayesian inference in the model rather than MAP estimates. Missing data experiments, on synthetic and real time-series, demonstrate the advantages of accounting for misalignments (vs standard unaligned method) as well as modelling the uncertainty in the warping process (vs baseline MAP alignment approach). '
  volume: 151
  URL: https://proceedings.mlr.press/v151/mikheeva22a.html
  PDF: https://proceedings.mlr.press/v151/mikheeva22a/mikheeva22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-mikheeva22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Olga
    family: Mikheeva
  - given: Ieva
    family: Kazlauskaite
  - given: Adam
    family: Hartshorne
  - given: Hedvig
    family: Kjellström
  - given: Carl
    family: Henrik Ek
  - given: Neill
    family: Campbell
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2970-2988
  id: mikheeva22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2970
  lastpage: 2988
  published: 2022-05-03 00:00:00 +0000
- title: ' Generative Models as Distributions of Functions '
  abstract: ' Generative models are typically trained on grid-like data such as images. As a result, the size of these models usually scales directly with the underlying grid resolution. In this paper, we abandon discretized grids and instead parameterize individual data points by continuous functions. We then build generative models by learning distributions over such functions. By treating data points as functions, we can abstract away from the specific type of data we train on and construct models that are agnostic to discretization. To train our model, we use an adversarial approach with a discriminator that acts on continuous signals. Through experiments on a wide variety of data modalities including images, 3D shapes and climate data, we demonstrate that our model can learn rich distributions of functions independently of data type and resolution. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/dupont22a.html
  PDF: https://proceedings.mlr.press/v151/dupont22a/dupont22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-dupont22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Emilien
    family: Dupont
  - given: Yee
    family: Whye Teh
  - given: Arnaud
    family: Doucet
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 2989-3015
  id: dupont22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 2989
  lastpage: 3015
  published: 2022-05-03 00:00:00 +0000
- title: ' ContextGen: Targeted Data Generation for Low Resource Domain Specific Text Classification '
  abstract: ' To address the challenging low-resource non-topical text classification problems in domain specific settings we introduce ContextGen – a novel approach that uses targeted text generation with no fine tuning to augment the available small annotated dataset. It first adapts the powerful GPT-2 text generation model to generate samples relevant for the domain by using properly designed context text as input for generation. Then it assigns class labels to the newly generated samples after which they are added to the initial training set. We demonstrate the superior performance of a state-of-the-art text classifier trained with the augmented labelled dataset for four different non-topical tasks in the low resource setting, three of which are from specialized domains. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/fromme22a.html
  PDF: https://proceedings.mlr.press/v151/fromme22a/fromme22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-fromme22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Lukas
    family: Fromme
  - given: Jasmina
    family: Bogojeska
  - given: Jonas
    family: Kuhn
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3016-3027
  id: fromme22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3016
  lastpage: 3027
  published: 2022-05-03 00:00:00 +0000
- title: ' Super-Acceleration with Cyclical Step-sizes '
  abstract: ' We develop a convergence-rate analysis of momentum with cyclical step-sizes. We show that under some assumption on the spectral gap of Hessians in machine learning, cyclical step-sizes are provably faster than constant step-sizes. More precisely, we develop a convergence rate analysis for quadratic objectives that provides optimal parameters and shows that cyclical learning rates can improve upon traditional lower complexity bounds. We further propose a systematic approach to design optimal first order methods for quadratic minimization with a given spectral structure. Finally, we provide a local convergence rate analysis beyond quadratic minimization for the proposed methods and illustrate our findings through benchmarks on least squares and logistic regression problems. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/goujaud22a.html
  PDF: https://proceedings.mlr.press/v151/goujaud22a/goujaud22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-goujaud22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Baptiste
    family: Goujaud
  - given: Damien
    family: Scieur
  - given: Aymeric
    family: Dieuleveut
  - given: Adrien B.
    family: Taylor
  - given: Fabian
    family: Pedregosa
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3028-3065
  id: goujaud22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3028
  lastpage: 3065
  published: 2022-05-03 00:00:00 +0000
- title: ' On PAC-Bayesian reconstruction guarantees for VAEs '
  abstract: ' Despite its wide use and empirical successes, the theoretical understanding and study of the behaviour and performance of the variational autoencoder (VAE) have only emerged in the past few years. We contribute to this recent line of work by analysing the VAE’s reconstruction ability for unseen test data, leveraging arguments from the PAC-Bayes theory. We provide generalisation bounds on the theoretical reconstruction error, and provide insights on the regularisation effect of VAE objectives. We illustrate our theoretical results with supporting experiments on classical benchmark datasets. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/cherief-abdellatif22a.html
  PDF: https://proceedings.mlr.press/v151/cherief-abdellatif22a/cherief-abdellatif22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-cherief-abdellatif22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Badr-Eddine
    family: Chérief-Abdellatif
  - given: Yuyang
    family: Shi
  - given: Arnaud
    family: Doucet
  - given: Benjamin
    family: Guedj
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3066-3079
  id: cherief-abdellatif22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3066
  lastpage: 3079
  published: 2022-05-03 00:00:00 +0000
- title: ' MT3: Meta Test-Time Training for Self-Supervised Test-Time Adaption '
  abstract: ' An unresolved problem in Deep Learning is the ability of neural networks to cope with domain shifts during test-time, imposed by commonly fixing network parameters after training. Our proposed method Meta Test-Time Training (MT3), however, breaks this paradigm and enables adaption at test-time. We combine meta-learning, self-supervision and test-time training to learn to adapt to unseen test distributions. By minimizing the self-supervised loss, we learn task-specific model parameters for different tasks. A meta-model is optimized such that its adaption to the different task-specific models leads to higher performance on those tasks. During test-time a single unlabeled image is sufficient to adapt the meta-model parameters. This is achieved by minimizing only the self-supervised loss component resulting in a better prediction for that image. Our approach significantly improves the state-of-the-art results on the CIFAR-10-Corrupted image classification benchmark. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/bartler22a.html
  PDF: https://proceedings.mlr.press/v151/bartler22a/bartler22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-bartler22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Alexander
    family: Bartler
  - given: Andre
    family: Bühler
  - given: Felix
    family: Wiewel
  - given: Mario
    family: Döbler
  - given: Bin
    family: Yang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3080-3090
  id: bartler22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3080
  lastpage: 3090
  published: 2022-05-03 00:00:00 +0000
- title: ' Random Effect Bandits '
  abstract: ' This paper studies regret minimization in a multi-armed bandit. It is well known that side information, such as the prior distribution of arm means in Thompson sampling, can improve the statistical efficiency of the bandit algorithm. While the prior is a blessing when correctly specified, it is a curse when misspecified. To address this issue, we introduce the assumption of a random-effect model to bandits. In this model, the mean arm rewards are drawn independently from an unknown distribution, which we estimate. We derive a random-effect estimator of the arm means, analyze its uncertainty, and design a UCB algorithm ReUCB that uses it. We analyze ReUCB and derive an upper bound on its n-round Bayes regret, which improves upon not using the random-effect structure. Our experiments show that ReUCB can outperform Thompson sampling, without knowing the prior distribution of arm means. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/zhu22b.html
  PDF: https://proceedings.mlr.press/v151/zhu22b/zhu22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-zhu22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Rong
    family: Zhu
  - given: Branislav
    family: Kveton
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3091-3107
  id: zhu22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3091
  lastpage: 3107
  published: 2022-05-03 00:00:00 +0000
- title: ' DEANN: Speeding up Kernel-Density Estimation using Approximate Nearest Neighbor Search '
  abstract: ' Kernel Density Estimation (KDE) is a nonparametric method for estimatig the shape of a density function, given a set of samples from the distribution. Recently, locality-sensitive hashing, originally proposed as a tool for nearest neighbor search, has been shown to enable fast KDE data structures. However, these approaches do not take advantage of the many other advances that have been made in algorithms for nearest neighbor algorithms. We present an algorithm called Density Estimation from Approximate Nearest Neighbors (DEANN) where we apply Approximate Nearest Neighbor (ANN) algorithms as a black box subroutine to compute an unbiased KDE. The idea is to find points that have a large contribution to the KDE using ANN, compute their contribution exactly, and approximate the remainder with Random Sampling (RS). We present a theoretical argument that supports the idea that an ANN subroutine can speed up the evaluation. Furthermore, we provide a C++ implementation with a Python interface that can make use of an arbitrary ANN implementation as a subroutine for KDE evaluation. We show empirically that our implementation outperforms state of the art implementations in all high dimensional datasets we considered, and matches the performance of RS in cases where the ANN yield no gains in performance. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/karppa22a.html
  PDF: https://proceedings.mlr.press/v151/karppa22a/karppa22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-karppa22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Matti
    family: Karppa
  - given: Martin
    family: Aumüller
  - given: Rasmus
    family: Pagh
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3108-3137
  id: karppa22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3108
  lastpage: 3137
  published: 2022-05-03 00:00:00 +0000
- title: ' Embedded Ensembles: infinite width limit and operating regimes '
  abstract: ' A memory efficient approach to ensembling neural networks is to share most weights among the ensembled models by means of a single reference network. We refer to this strategy as <span class="emphasized">Embedded Ensembling</span> (EE); its particular examples are BatchEnsembles and Monte-Carlo dropout ensembles. In this paper we perform a systematic theoretical and empirical analysis of embedded ensembles with different number of models. Theoretically, we use a Neural-Tangent-Kernel-based approach to derive the wide network limit of the gradient descent dynamics. In this limit, we identify two ensemble regimes - <span class="emphasized">independent</span> and <span class="emphasized">collective</span> - depending on the architecture and initialization strategy of ensemble models. We prove that in the independent regime the embedded ensemble behaves as an ensemble of independent models. We confirm our theoretical prediction with a wide range of experiments with finite networks, and further study empirically various effects such as transition between the two regimes, scaling of ensemble performance with the network width and number of models, and dependence of performance on a number of architecture and hyperparameter choices. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/velikanov22a.html
  PDF: https://proceedings.mlr.press/v151/velikanov22a/velikanov22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-velikanov22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Maksim
    family: Velikanov
  - given: Roman V.
    family: Kail
  - given: Ivan
    family: Anokhin
  - given: Roman
    family: Vashurin
  - given: Maxim
    family: Panov
  - given: Alexey
    family: Zaytsev
  - given: Dmitry
    family: Yarotsky
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3138-3163
  id: velikanov22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3138
  lastpage: 3163
  published: 2022-05-03 00:00:00 +0000
- title: ' State Dependent Performative Prediction with Stochastic Approximation '
  abstract: ' This paper studies the performative prediction problem which optimizes a stochastic loss function with data distribution that depends on the decision variable. We consider a setting where the agent(s) provides samples adapted to both the learner’s and agent’s previous states. The samples are then used by the learner to update his/her state to optimize a loss function. Such closed loop update dynamics is studied as a state dependent stochastic approximation (SA) algorithm, which is shown to find a fixed point known as the performative stable solution. Our setting captures the unforgetful nature and reliance on past experiences of agents. Our contributions are three-fold. First, we present a framework for state dependent performative prediction with biased stochastic gradients driven by a controlled Markov chain whose transition probability depends on the learner’s state. Second, we present a new finite-time performance analysis of the SA algorithm. We show that the expected squared distance to the performative stable solution decreases as O(1/k), where k is the iteration number. Third, numerical experiments verify our findings. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/li22c.html
  PDF: https://proceedings.mlr.press/v151/li22c/li22c.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-li22c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Qiang
    family: Li
  - given: Hoi-To
    family: Wai
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3164-3186
  id: li22c
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3164
  lastpage: 3186
  published: 2022-05-03 00:00:00 +0000
- title: ' Reward-Free Policy Space Compression for Reinforcement Learning '
  abstract: ' In reinforcement learning, we encode the potential behaviors of an agent interacting with an environment into an infinite set of policies, called policy space, typically represented by a family of parametric functions. Dealing with such a policy space is a hefty challenge, which often causes sample and computational inefficiencies. However, we argue that a limited number of policies are actually relevant when we also account for the structure of the environment and of the policy parameterization, as many of them would induce very similar interactions, i.e., state-action distributions. In this paper, we seek for a reward-free compression of the policy space into a finite set of representative policies, such that, given any policy $\pi$, the minimum Rényi divergence between the state-action distributions of the representative policies and the state-action distribution of $\pi$ is bounded. We show that this compression of the policy space can be formulated as a set cover problem, and it is inherently NP-hard. Nonetheless, we propose a game-theoretic reformulation for which a locally optimal solution can be efficiently found by iteratively stretching the compressed space to cover the most challenging policy. Finally, we provide an empirical evaluation to illustrate the compression procedure in simple domains, and its ripple effects in reinforcement learning. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/mutti22a.html
  PDF: https://proceedings.mlr.press/v151/mutti22a/mutti22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-mutti22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Mirco
    family: Mutti
  - given: Stefano
    family: Del Col
  - given: Marcello
    family: Restelli
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3187-3203
  id: mutti22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3187
  lastpage: 3203
  published: 2022-05-03 00:00:00 +0000
- title: ' Nonstationary multi-output Gaussian processes via harmonizable spectral mixtures '
  abstract: ' Kernel design for Multi-output Gaussian Processes (MOGP) has received increased attention recently, in particular, the Multi-Output Spectral Mixture kernel (MOSM) approach has been praised as a general model in the sense that it extends other approaches such as Linear Model of Corregionalization, Intrinsic Corregionalization Model and Cross-Spectral Mixture. MOSM relies on Cramer’s theorem to parametrise the power spectral densities (PSD) as a Gaussian mixture, thus, having a structural restriction: by assuming the existence of a PSD, the method is only suited for multi-output stationary processes. We develop a nonstationary extension of MOSM by proposing the family of harmonizable kernels for MOGPs, a class of kernels that contains both stationary and a vast majority of non-stationary processes. A main contribution of the proposed harmonizable kernels is that they automatically identify a possible nonstationary behaviour meaning that practitioners do not need to choose between stationary or non-stationary kernels. The proposed method is first validated on synthetic data with the purpose of illustrating the key properties of our approach, and then compared to existing MOGP methods on two real-world settings from finance and electroencephalography. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/altamirano22a.html
  PDF: https://proceedings.mlr.press/v151/altamirano22a/altamirano22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-altamirano22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Matias
    family: Altamirano
  - given: Felipe
    family: Tobar
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3204-3218
  id: altamirano22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3204
  lastpage: 3218
  published: 2022-05-03 00:00:00 +0000
- title: ' Learning Quantile Functions for Temporal Point Processes with Recurrent Neural Splines '
  abstract: ' We can build flexible predictive models for rich continuous-time event data by combining the framework of temporal point processes (TPP) with (recurrent) neural networks. We propose a new neural parametrization for TPPs based on the conditional quantile function. Specifically, we use a flexible monotonic rational-quadratic spline to learn a smooth continuous quantile function. Conditioning on historical events is achieved through a recurrent neural network. This novel parametrization provides a flexible yet tractable TPP model with multiple advantages, such as analytical sampling and closed-form expressions for quantiles and prediction intervals. While neural TPP models are often trained using maximum likelihood estimation, we consider the more robust continuous ranked probability score (CRPS). We additionally derive a closed-form expression for the CRPS of our model. Finally, we demonstrate that the proposed model achieves state-of-the-art performance in standard prediction tasks on both synthetic and real-world event data. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ben-taieb22a.html
  PDF: https://proceedings.mlr.press/v151/ben-taieb22a/ben-taieb22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ben-taieb22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Souhaib
    family: Ben Taieb
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3219-3241
  id: ben-taieb22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3219
  lastpage: 3241
  published: 2022-05-03 00:00:00 +0000
- title: ' Differentially Private Regression with Unbounded Covariates '
  abstract: ' We provide computationally efficient, differentially private algorithms for the classical regression settings of Least Squares Fitting, Binary Regression and Linear Regression with unbounded covariates. Prior to our work, privacy constraints in such regression settings were studied under strong a priori bounds on covariates. We consider the case of Gaussian marginals and extend recent differentially private techniques on mean and covariance estimation (Kamath et al., 2019; Karwa and Vadhan, 2018) to the sub-gaussian regime. We provide a novel technical analysis yielding differentially private algorithms for the above classical regression settings. Through the case of Binary Regression, we capture the fundamental and widely-studied models of logistic regression and linearly-separable SVMs, learning an unbiased estimate of the true regression vector, up to a scaling factor. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/milionis22a.html
  PDF: https://proceedings.mlr.press/v151/milionis22a/milionis22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-milionis22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jason
    family: Milionis
  - given: Alkis
    family: Kalavasis
  - given: Dimitris
    family: Fotakis
  - given: Stratis
    family: Ioannidis
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3242-3273
  id: milionis22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3242
  lastpage: 3273
  published: 2022-05-03 00:00:00 +0000
- title: ' Triple-Q: A Model-Free Algorithm for Constrained Reinforcement Learning with Sublinear Regret and Zero Constraint Violation '
  abstract: ' This paper presents the first model-free, simulator-free reinforcement learning algorithm for Constrained Markov Decision Processes (CMDPs) with sublinear regret and zero constraint violation. The algorithm is named Triple-Q because it includes three key components: a Q-function (also called action-value function) for the cumulative reward, a Q-function for the cumulative utility for the constraint, and a virtual-Queue that (over)-estimates the cumulative constraint violation. Under Triple-Q, at each step, an action is chosen based on the pseudo-Q-value that is a combination of the three “Q” values. The algorithm updates the reward and utility Q-values with learning rates that depend on the visit counts to the corresponding (state, action) pairs and are periodically reset. In the episodic CMDP setting, Triple-Q achieves $\tilde{\cal O}\left(\frac{1 }{\delta}H^4 S^{\frac{1}{2}}A^{\frac{1}{2}}K^{\frac{4}{5}} \right)$ regret, where $K$ is the total number of episodes, $H$ is the number of steps in each episode, $S$ is the number of states, $A$ is the number of actions, and $\delta$ is Slater’s constant. Furthermore, {Triple-Q} guarantees zero constraint violation, both on expectation and with a high probability, when $K$ is sufficiently large. Finally, the computational complexity of {Triple-Q} is similar to SARSA for unconstrained MDPs, and is computationally efficient. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/wei22a.html
  PDF: https://proceedings.mlr.press/v151/wei22a/wei22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-wei22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Honghao
    family: Wei
  - given: Xin
    family: Liu
  - given: Lei
    family: Ying
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3274-3307
  id: wei22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3274
  lastpage: 3307
  published: 2022-05-03 00:00:00 +0000
- title: ' Measuring the robustness of Gaussian processes to kernel choice '
  abstract: ' Gaussian processes (GPs) are used to make medical and scientific decisions, including in cardiac care and monitoring of carbon dioxide emissions. Notably, the choice of GP kernel is often somewhat arbitrary. In particular, uncountably many kernels typically align with qualitative prior knowledge (e.g. function smoothness or stationarity). But in practice, data analysts choose among a handful of convenient standard kernels (e.g. squared exponential). In the present work, we ask: Would decisions made with a GP differ under other, qualitatively interchangeable kernels? We show how to formulate this sensitivity analysis as a constrained optimization problem over a finite-dimensional space. We can then use standard optimizers to identify substantive changes in relevant decisions made with a GP. We demonstrate in both synthetic and real-world examples that decisions made with a GP can exhibit substantial sensitivity to kernel choice, even when prior draws are qualitatively interchangeable to a user. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/stephenson22a.html
  PDF: https://proceedings.mlr.press/v151/stephenson22a/stephenson22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-stephenson22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: William T.
    family: Stephenson
  - given: Soumya
    family: Ghosh
  - given: Tin D.
    family: Nguyen
  - given: Mikhail
    family: Yurochkin
  - given: Sameer
    family: Deshpande
  - given: Tamara
    family: Broderick
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3308-3331
  id: stephenson22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3308
  lastpage: 3331
  published: 2022-05-03 00:00:00 +0000
- title: ' A general sample complexity analysis of vanilla policy gradient '
  abstract: ' We adapt recent tools developed for the analysis of Stochastic Gradient Descent (SGD) in non-convex optimization to obtain convergence and sample complexity guarantees for the vanilla policy gradient (PG). Our only assumptions are that the expected return is smooth w.r.t. the policy parameters, that its $H$-step truncated gradient is close to the exact gradient, and a certain ABC assumption. This assumption requires the second moment of the estimated gradient to be bounded by $A \geq 0$ times the suboptimality gap, $B \geq 0$ times the norm of the full batch gradient and an additive constant $C \geq 0$, or any combination of aforementioned. We show that the ABC assumption is more general than the commonly used assumptions on the policy space to prove convergence to a stationary point. We provide a single convergence theorem that recovers the $\widetilde{\mathcal{O}}(\epsilon^{-4})$ sample complexity of PG. Our results also affords greater flexibility in the choice of hyper parameters such as the step size and places no restriction on the batch size $m$, including the single trajectory case (i.e., $m=1$). We then instantiate our theorem in different settings, where we both recover existing results and obtained improved sample complexity, e.g., for convergence to the global optimum for Fisher-non-degenerated parameterized policies. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/yuan22a.html
  PDF: https://proceedings.mlr.press/v151/yuan22a/yuan22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-yuan22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Rui
    family: Yuan
  - given: Robert M.
    family: Gower
  - given: Alessandro
    family: Lazaric
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3332-3380
  id: yuan22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3332
  lastpage: 3380
  published: 2022-05-03 00:00:00 +0000
- title: ' Pairwise Fairness for Ordinal Regression '
  abstract: ' We initiate the study of fairness for ordinal regression. We adapt two fairness notions previously considered in fair ranking and propose a strategy for training a predictor that is approximately fair according to either notion. Our predictor has the form of a threshold model, composed of a scoring function and a set of thresholds, and our strategy is based on a reduction to fair binary classification for learning the scoring function and local search for choosing the thresholds. We provide generalization guarantees on the error and fairness violation of our predictor, and we illustrate the effectiveness of our approach in extensive experiments. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/kleindessner22a.html
  PDF: https://proceedings.mlr.press/v151/kleindessner22a/kleindessner22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-kleindessner22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Matthäus
    family: Kleindessner
  - given: Samira
    family: Samadi
  - given: Muhammad
    family: Bilal Zafar
  - given: Krishnaram
    family: Kenthapadi
  - given: Chris
    family: Russell
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3381-3417
  id: kleindessner22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3381
  lastpage: 3417
  published: 2022-05-03 00:00:00 +0000
- title: ' LIMESegment: Meaningful, Realistic Time Series Explanations '
  abstract: ' LIME (Locally Interpretable Model-Agnostic Explanations) has become a popular way of generating explanations for tabular, image and natural language models, providing insight into why an instance was given a particular classification. In this paper we adapt LIME to time series classification, an under-explored area with existing approaches failing to account for the structure of this kind of data. We frame the non-trivial challenge of adapting LIME to time series classification as the following open questions: “What is a meaningful interpretable representation of a time series?”, “How does one realistically perturb a time series?” and “What is a local neighbourhood around a time series?”. We propose solutions to all three questions and combine them into a novel time series explanation framework called LIMESegment, which outperforms existing adaptations of LIME to time series on a variety of classification tasks. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/sivill22a.html
  PDF: https://proceedings.mlr.press/v151/sivill22a/sivill22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-sivill22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Torty
    family: Sivill
  - given: Peter
    family: Flach
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3418-3433
  id: sivill22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3418
  lastpage: 3433
  published: 2022-05-03 00:00:00 +0000
- title: ' A Random Matrix Perspective on Mixtures of Nonlinearities in High Dimensions '
  abstract: ' One of the distinguishing characteristics of modern deep learning systems is their use of neural network architectures with enormous numbers of parameters, often in the millions and sometimes even in the billions. While this paradigm has inspired significant research on the properties of large networks, relatively little work has been devoted to the fact that these networks are often used to model large complex datasets, which may themselves contain millions or even billions of constraints. In this work, we focus on this high-dimensional regime in which both the dataset size and the number of features tend to infinity. We analyze the performance of random feature regression with features $F=f(WX+B)$ for a random weight matrix $W$ and bias vector $B$, obtaining exact formulae for the asymptotic training and test errors for data generated by a linear teacher model. The role of the bias can be understood as parameterizing a distribution over activation functions, and our analysis directly generalizes to such distributions, even those not expressible with a traditional additive bias. Intriguingly, we find that a mixture of nonlinearities can improve both the training and test errors over the best single nonlinearity, suggesting that mixtures of nonlinearities might be useful for approximate kernel methods or neural network architecture design. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/adlam22a.html
  PDF: https://proceedings.mlr.press/v151/adlam22a/adlam22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-adlam22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ben
    family: Adlam
  - given: Jake A.
    family: Levinson
  - given: Jeffrey
    family: Pennington
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3434-3457
  id: adlam22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3434
  lastpage: 3457
  published: 2022-05-03 00:00:00 +0000
- title: ' Spectral Pruning for Recurrent Neural Networks '
  abstract: ' Recurrent neural networks (RNNs) are a class of neural networks used in sequential tasks. However, in general, RNNs have a large number of parameters and involve enormous computational costs by repeating the recurrent structures in many time steps. As a method to overcome this difficulty, RNN pruning has attracted increasing attention in recent years, and it brings us benefits in terms of the reduction of computational cost as the time step progresses. However, most existing methods of RNN pruning are heuristic. The purpose of this paper is to study the theoretical scheme for RNN pruning method. We propose an appropriate pruning algorithm for RNNs inspired by "spectral pruning", and provide the generalization error bounds for compressed RNNs. We also provide numerical experiments to demonstrate our theoretical results and show the effectiveness of our pruning method compared with the existing methods. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/furuya22a.html
  PDF: https://proceedings.mlr.press/v151/furuya22a/furuya22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-furuya22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Takashi
    family: Furuya
  - given: Kazuma
    family: Suetake
  - given: Koichi
    family: Taniguchi
  - given: Hiroyuki
    family: Kusumoto
  - given: Ryuji
    family: Saiin
  - given: Tomohiro
    family: Daimon
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3458-3482
  id: furuya22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3458
  lastpage: 3482
  published: 2022-05-03 00:00:00 +0000
- title: ' Many processors, little time: MCMC for partitions via optimal transport couplings '
  abstract: ' Markov chain Monte Carlo (MCMC) methods are often used in clustering since they guarantee asymptotically exact expectations in the infinite-time limit. In finite time, though, slow mixing often leads to poor performance. Modern computing environments offer massive parallelism, but naive implementations of parallel MCMC can exhibit substantial bias. In MCMC samplers of continuous random variables, Markov chain couplings can overcome bias. But these approaches depend crucially on paired chains meetings after a small number of transitions. We show that straightforward applications of existing coupling ideas to discrete clustering variables fail to meet quickly. This failure arises from the "label-switching problem": semantically equivalent cluster relabelings impede fast meeting of coupled chains. We instead consider chains as exploring the space of partitions rather than partitions’ (arbitrary) labelings. Using a metric on the partition space, we formulate a practical algorithm using optimal transport couplings. Our theory confirms our method is accurate and efficient. In experiments ranging from clustering of genes or seeds to graph colorings, we show the benefits of our coupling in the highly parallel, time-limited regime. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/nguyen22a.html
  PDF: https://proceedings.mlr.press/v151/nguyen22a/nguyen22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-nguyen22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Tin D.
    family: Nguyen
  - given: Brian L.
    family: Trippe
  - given: Tamara
    family: Broderick
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3483-3514
  id: nguyen22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3483
  lastpage: 3514
  published: 2022-05-03 00:00:00 +0000
- title: ' Sinkformers: Transformers with Doubly Stochastic Attention '
  abstract: ' Attention based models such as Transformers involve pairwise interactions between data points, modeled with a learnable attention matrix. Importantly, this attention matrix is normalized with the SoftMax operator, which makes it row-wise stochastic. In this paper, we propose instead to use Sinkhorn’s algorithm to make attention matrices doubly stochastic. We call the resulting model a Sinkformer. We show that the row-wise stochastic attention matrices in classical Transformers get close to doubly stochastic matrices as the number of epochs increases, justifying the use of Sinkhorn normalization as an informative prior. On the theoretical side, we show that, unlike the SoftMax operation, this normalization makes it possible to understand the iterations of self-attention modules as a discretized gradient-flow for the Wasserstein metric. We also show in the infinite number of samples limit that, when rescaling both attention matrices and depth, Sinkformers operate a heat diffusion. On the experimental side, we show that Sinkformers enhance model accuracy in vision and natural language processing tasks. In particular, on 3D shapes classification, Sinkformers lead to a significant improvement. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/sander22a.html
  PDF: https://proceedings.mlr.press/v151/sander22a/sander22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-sander22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Michael E.
    family: Sander
  - given: Pierre
    family: Ablin
  - given: Mathieu
    family: Blondel
  - given: Gabriel
    family: Peyré
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3515-3530
  id: sander22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3515
  lastpage: 3530
  published: 2022-05-03 00:00:00 +0000
- title: ' Finding Nearly Everything within Random Binary Networks '
  abstract: ' A recent work by Ramanujan et al. (2020) provides significant empirical evidence that sufficiently overparameterized, random neural networks contain untrained subnetworks that achieve state-of-the-art accuracy on several predictive tasks. A follow-up line of theoretical work provides justification of these findings by proving that slightly overparameterized neural networks, with commonly used continuous-valued random initializations can indeed be pruned to approximate any target network. In this work, we show that the amplitude of those random weights does not even matter. We prove that any target network of width $d$ and depth $l$ can be approximated up to arbitrary accuracy $\varepsilon$ by simply pruning a random network of binary $\{\pm1\}$ weights that is wider and deeper than the target network only by a polylogarithmic factor of $d, l$ and $\varepsilon$. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/sreenivasan22a.html
  PDF: https://proceedings.mlr.press/v151/sreenivasan22a/sreenivasan22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-sreenivasan22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Kartik
    family: Sreenivasan
  - given: Shashank
    family: Rajput
  - given: Jy-Yong
    family: Sohn
  - given: Dimitris
    family: Papailiopoulos
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3531-3541
  id: sreenivasan22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3531
  lastpage: 3541
  published: 2022-05-03 00:00:00 +0000
- title: ' Last Layer Marginal Likelihood for Invariance Learning '
  abstract: ' Data augmentation is often used to incorporate inductive biases into models. Traditionally, these are hand-crafted and tuned with cross validation. The Bayesian paradigm for model selection provides a path towards end-to-end learning of invariances using only the training data, by optimising the marginal likelihood. Computing the marginal likelihood is hard for neural networks, but success with tractable approaches that compute the marginal likelihood for the last layer only raises the question of whether this convenient approach might be employed for learning invariances. We show partial success on standard benchmarks, in the low-data regime and on a medical imaging dataset by designing a custom optimisation routine. Introducing a new lower bound to the marginal likelihood allows us to perform inference for a larger class of likelihood functions than before. On the other hand, we demonstrate failure modes on the CIFAR10 dataset, where the last layer approximation is not sufficient due to the increased complexity of our neural network. Our results indicate that once more sophisticated approximations become available the marginal likelihood is a promising approach for invariance learning in neural networks. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/schwobel22a.html
  PDF: https://proceedings.mlr.press/v151/schwobel22a/schwobel22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-schwobel22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Pola
    family: Schwöbel
  - given: Martin
    family: Jørgensen
  - given: Sebastian W.
    family: Ober
  - given: Mark
    family: Van Der Wilk
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3542-3555
  id: schwobel22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3542
  lastpage: 3555
  published: 2022-05-03 00:00:00 +0000
- title: ' Minimax Optimization: The Case of Convex-Submodular '
  abstract: ' Minimax optimization has been central in addressing various applications in machine learning, game theory, and control theory. Prior literature has thus far mainly focused on studying such problems in the continuous domain, e.g., convex-concave minimax optimization is now understood to a significant extent. Nevertheless, minimax problems extend far beyond the continuous domain to mixed continuous-discrete domains or even fully discrete domains. In this paper, we study mixed continuous-discrete minimax problems where the minimization is over a continuous variable belonging to Euclidean space and the maximization is over subsets of a given ground set. We introduce the class of convex-submodular minimax problems, where the objective is convex with respect to the continuous variable and submodular with respect to the discrete variable. Even though such problems appear frequently in machine learning applications, little is known about how to address them from algorithmic and theoretical perspectives. For such problems, we first show that obtaining saddle points are hard up to any approximation, and thus introduce new notions of (near-) optimality. We then provide several algorithmic procedures for solving convex and monotone-submodular minimax problems and characterize their convergence rates, computational complexity, and quality of the final solution according to our notions of optimally. Our proposed algorithms are iterative and combine tools from both discrete and continuous optimization. Finally, we provide numerical experiments to showcase the effectiveness of our purposed methods. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/adibi22a.html
  PDF: https://proceedings.mlr.press/v151/adibi22a/adibi22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-adibi22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Arman
    family: Adibi
  - given: Aryan
    family: Mokhtari
  - given: Hamed
    family: Hassani
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3556-3580
  id: adibi22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3556
  lastpage: 3580
  published: 2022-05-03 00:00:00 +0000
- title: ' Federated Learning with Buffered Asynchronous Aggregation '
  abstract: ' Scalability and privacy are two critical concerns for cross-device federated learning (FL) systems. In this work, we identify that synchronous FL – cannot scale efficiently beyond a few hundred clients training in parallel. It leads to diminishing returns in model performance and training speed, analogous to large-batch training. On the other hand, asynchronous aggregation of client updates in FL (i.e., asynchronous FL) alleviates the scalability issue. However, aggregating individual client updates is incompatible with Secure Aggregation, which could result in an undesirable level of privacy for the system. To address these concerns, we propose a novel buffered asynchronous aggregation method, FedBuff, that is agnostic to the choice of optimizer, and combines the best properties of synchronous and asynchronous FL. We empirically demonstrate that FedBuff is $3.3\times$ more efficient than synchronous FL and up to $2.5\times$ more efficient than asynchronous FL, while being compatible with privacy-preserving technologies such as Secure Aggregation and differential privacy. We provide theoretical convergence guarantees in a smooth non-convex setting. Finally, we show that under differentially private training, FedBuff can outperform FedAvgM at low privacy settings and achieve the same utility for higher privacy settings. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/nguyen22b.html
  PDF: https://proceedings.mlr.press/v151/nguyen22b/nguyen22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-nguyen22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: John
    family: Nguyen
  - given: Kshitiz
    family: Malik
  - given: Hongyuan
    family: Zhan
  - given: Ashkan
    family: Yousefpour
  - given: Mike
    family: Rabbat
  - given: Mani
    family: Malek
  - given: Dzmitry
    family: Huba
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3581-3607
  id: nguyen22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3581
  lastpage: 3607
  published: 2022-05-03 00:00:00 +0000
- title: ' Bayesian Inference and Partial Identification in Multi-Treatment Causal Inference with Unobserved Confounding '
  abstract: ' In causal estimation problems, the parameter of interest is often only partially identified, implying that the parameter cannot be recovered exactly, even with infinite data. Here, we study Bayesian inference for partially identified treatment effects in multi-treatment causal inference problems with unobserved confounding. In principle, inferring the partially identified treatment effects is natural under the Bayesian paradigm, but the results can be highly sensitive to parameterization and prior specification, often in surprising ways. It is thus essential to understand which aspects of the conclusions about treatment effects are driven entirely by the prior specification. We use a so-called transparent parameterization to contextualize the effects of more interpretable scientifically motivated prior specifications on the multiple effects. We demonstrate our analysis in an example quantifying the effects of gene expression levels on mouse obesity. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/zheng22a.html
  PDF: https://proceedings.mlr.press/v151/zheng22a/zheng22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-zheng22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jiajing
    family: Zheng
  - given: Alexander
    family: D’Amour
  - given: Alexander
    family: Franks
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3608-3626
  id: zheng22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3608
  lastpage: 3626
  published: 2022-05-03 00:00:00 +0000
- title: ' Deep Neyman-Scott Processes '
  abstract: ' A Neyman-Scott process is a special case of a Cox process. The latent and observable stochastic processes are both Poisson processes. We consider a deep Neyman-Scott process in this paper, for which the building components of a network are all Poisson processes. We develop an efficient posterior sampling via Markov chain Monte Carlo and use it for likelihood-based inference. Our method opens up room for the inference in sophisticated hierarchical point processes. We show in the experiments that more hidden Poisson processes brings better performance for likelihood fitting and events types prediction. We also compare our method with state-of-the-art models for temporal real-world datasets and demonstrate competitive abilities for both data fitting and prediction, using far fewer parameters. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/hong22a.html
  PDF: https://proceedings.mlr.press/v151/hong22a/hong22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-hong22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Chengkuan
    family: Hong
  - given: Christian
    family: Shelton
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3627-3646
  id: hong22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3627
  lastpage: 3646
  published: 2022-05-03 00:00:00 +0000
- title: ' CATVI: Conditional and Adaptively Truncated Variational Inference for Hierarchical Bayesian Nonparametric Models '
  abstract: ' Current variational inference methods for hierarchical Bayesian nonparametric models can neither characterize the correlation structure among latent variables due to the mean-field setting, nor infer the true posterior dimension because of the universal truncation. To overcome these limitations, we propose the conditional and adaptively truncated variational inference method (CATVI) by maximizing the nonparametric evidence lower bound and integrating Monte Carlo into the variational inference framework. CATVI enjoys several advantages over traditional methods, including a smaller divergence between variational and true posteriors, reduced risk of underfitting or overfitting, and improved prediction accuracy. Empirical studies on three large datasets reveal that CATVI applied in Bayesian nonparametric topic models substantially outperforms competing models, providing lower perplexity and clearer topic-words clustering. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/liu22d.html
  PDF: https://proceedings.mlr.press/v151/liu22d/liu22d.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-liu22d.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yirui
    family: Liu
  - given: Xinghao
    family: Qiao
  - given: Jessica
    family: Lam
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3647-3662
  id: liu22d
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3647
  lastpage: 3662
  published: 2022-05-03 00:00:00 +0000
- title: ' Certifiably Robust Variational Autoencoders '
  abstract: ' We introduce an approach for training variational autoencoders (VAEs) that are certifiably robust to adversarial attack. Specifically, we first derive actionable bounds on the minimal size of an input perturbation required to change a VAE’s reconstruction by more than an allowed amount, with these bounds depending on certain key parameters such as the Lipschitz constants of the encoder and decoder. We then show how these parameters can be controlled, thereby providing a mechanism to ensure a priori that a VAE will attain a desired level of robustness. Moreover, we extend this to a complete practical approach for training such VAEs to ensure our criteria are met. Critically, our method allows one to specify a desired level of robustness upfront and then train a VAE that is guaranteed to achieve this robustness. We further demonstrate that these Lipschitz-constrained VAEs are more robust to attack than standard VAEs in practice. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/barrett22a.html
  PDF: https://proceedings.mlr.press/v151/barrett22a/barrett22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-barrett22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ben
    family: Barrett
  - given: Alexander
    family: Camuto
  - given: Matthew
    family: Willetts
  - given: Tom
    family: Rainforth
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3663-3683
  id: barrett22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3663
  lastpage: 3683
  published: 2022-05-03 00:00:00 +0000
- title: ' Practical Schemes for Finding Near-Stationary Points of Convex Finite-Sums '
  abstract: ' In convex optimization, the problem of finding near-stationary points has not been adequately studied yet, unlike other optimality measures such as the function value. Even in the deterministic case, the optimal method (OGM-G, due to Kim and Fessler (2021)) has just been discovered recently. In this work, we conduct a systematic study of algorithmic techniques for finding near-stationary points of convex finite-sums. Our main contributions are several algorithmic discoveries: (1) we discover a memory-saving variant of OGM-G based on the performance estimation problem approach (Drori and Teboulle, 2014); (2) we design a new accelerated SVRG variant that can simultaneously achieve fast rates for minimizing both the gradient norm and function value; (3) we propose an adaptively regularized accelerated SVRG variant, which does not require the knowledge of some unknown initial constants and achieves near-optimal complexities. We put an emphasis on the simplicity and practicality of the new schemes, which could facilitate future work. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/zhou22a.html
  PDF: https://proceedings.mlr.press/v151/zhou22a/zhou22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-zhou22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Kaiwen
    family: Zhou
  - given: Lai
    family: Tian
  - given: Anthony
    family: Man-Cho So
  - given: James
    family: Cheng
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3684-3708
  id: zhou22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3684
  lastpage: 3708
  published: 2022-05-03 00:00:00 +0000
- title: ' On Margins and Derandomisation in PAC-Bayes '
  abstract: ' We give a general recipe for derandomising PAC-Bayesian bounds using margins, with the critical ingredient being that our randomised predictions concentrate around some value. The tools we develop straightforwardly lead to margin bounds for various classifiers, including linear prediction—a class that includes boosting and the support vector machine—single-hidden-layer neural networks with an unusual erf activation function, and deep ReLU networks. Further we extend to partially-derandomised predictors where only some of the randomness of our estimators is removed, letting us extend bounds to cases where the concentration properties of our estimators are otherwise poor. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/biggs22a.html
  PDF: https://proceedings.mlr.press/v151/biggs22a/biggs22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-biggs22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Felix
    family: Biggs
  - given: Benjamin
    family: Guedj
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3709-3731
  id: biggs22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3709
  lastpage: 3731
  published: 2022-05-03 00:00:00 +0000
- title: ' A Spectral Perspective of DNN Robustness to Label Noise '
  abstract: ' Deep networks usually require a massive amount of labeled data for their training. Yet, such data may include some mistakes in the labels. Interestingly, networks have been shown to be robust to such errors. This work uses spectral analysis of their learned mapping to provide an explanation for their robustness. In particular, we relate the smoothness regularization that usually exists in conventional training to the attenuation of high frequencies, which mainly characterize noise. By using a connection between the smoothness and the spectral norm of the network weights, we suggest that one may further improve robustness via spectral normalization. Empirical experiments validate our claims and show the advantage of this normalization for classification with label noise. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/bar22a.html
  PDF: https://proceedings.mlr.press/v151/bar22a/bar22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-bar22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Oshrat
    family: Bar
  - given: Amnon
    family: Drory
  - given: Raja
    family: Giryes
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3732-3752
  id: bar22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3732
  lastpage: 3752
  published: 2022-05-03 00:00:00 +0000
- title: ' Momentum Accelerates the Convergence of Stochastic AUPRC Maximization '
  abstract: ' In this paper, we study stochastic optimization of areas under precision-recall curves (AUPRC), which is widely used for combating imbalanced classification tasks. Although a few methods have been proposed for maximizing AUPRC, stochastic optimization of AUPRC with convergence guarantee remains an undeveloped territory. A state-of-the-art complexity is $O(1/\epsilon^5)$ for finding an $\epsilon$-stationary solution. In this paper, we further improve the stochastic optimization of AURPC by (i) developing novel stochastic momentum methods with a better iteration complexity of $O(1/\epsilon^4)$ for finding an $\epsilon$-stationary solution; and (ii) designing a novel family of stochastic adaptive methods with the same iteration complexity, which enjoy faster convergence in practice. To this end, we propose two innovative techniques that are critical for improving the convergence: (i) the biased estimators for tracking individual ranking scores are updated in a randomized coordinate-wise manner; and (ii) a momentum update is used on top of the stochastic gradient estimator for tracking the gradient of the objective. The novel analysis of Adam-style updates is also one main contribution. Extensive experiments on various data sets demonstrate the effectiveness of the proposed algorithms. Of independent interest, the proposed stochastic momentum and adaptive algorithms are also applicable to a class of two-level stochastic dependent compositional optimization problems. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/wang22b.html
  PDF: https://proceedings.mlr.press/v151/wang22b/wang22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-wang22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Guanghui
    family: Wang
  - given: Ming
    family: Yang
  - given: Lijun
    family: Zhang
  - given: Tianbao
    family: Yang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3753-3771
  id: wang22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3753
  lastpage: 3771
  published: 2022-05-03 00:00:00 +0000
- title: ' Computing D-Stationary Points of $ρ$-Margin Loss SVM '
  abstract: ' This paper is concerned with the algorithmic aspects of sharper stationarity of a nonconvex, nonsmooth, Clarke irregular machine learning model. We study the SVM problem with a $\rho$-margin loss function, which is the margin theory generalization bound of SVM introduced in the learning theory textbook by Mohri et al. [2018], and has been extensively studied in operations research, statistics, and machine learning communities. However, due to its nonconvex, nonsmooth, and irregular nature, none of the existing optimization methods can efficiently compute a d(irectional)-stationary point, which turns out to be also a local minimum, for the $\rho$-margin loss SVM problem. After a detailed discussion of various nonsmooth stationarity notions, we propose a highly efficient nonconvex semi-proximal ADMM-based scheme that provably computes d-stationary points and enjoys a local linear convergence rate. We report concrete examples to demonstrate the necessity of our assumptions. Numerical results verify the effectiveness of the new algorithm and complement our theoretical results. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/tian22a.html
  PDF: https://proceedings.mlr.press/v151/tian22a/tian22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-tian22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Lai
    family: Tian
  - given: Anthony
    family: Man-Cho So
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3772-3793
  id: tian22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3772
  lastpage: 3793
  published: 2022-05-03 00:00:00 +0000
- title: ' A Class of Geometric Structures in Transfer Learning: Minimax Bounds and Optimality '
  abstract: ' We study the problem of transfer learning, observing that previous efforts to understand its information-theoretic limits do not fully exploit the geometric structure of the source and target domains. In contrast, our study first illustrates the benefits of incorporating a natural geometric structure within a linear regression model, which corresponds to the generalized eigenvalue problem formed by the Gram matrices of both domains. We next establish a finite-sample minimax lower bound, propose a refined model interpolation estimator that enjoys a matching upper bound, and then extend our framework to multiple source domains and generalized linear models. Surprisingly, as long as information is available on the distance between the source and target parameters, negative-transfer does not occur. Simulation studies show that our proposed interpolation estimator outperforms state-of-the-art transfer learning methods in both moderate- and high-dimensional settings. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/zhang22a.html
  PDF: https://proceedings.mlr.press/v151/zhang22a/zhang22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-zhang22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Xuhui
    family: Zhang
  - given: Jose
    family: Blanchet
  - given: Soumyadip
    family: Ghosh
  - given: Mark S.
    family: Squillante
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3794-3820
  id: zhang22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3794
  lastpage: 3820
  published: 2022-05-03 00:00:00 +0000
- title: ' An Information-Theoretic Justification for Model Pruning '
  abstract: ' We study the neural network (NN) compression problem, viewing the tension between the compression ratio and NN performance through the lens of rate-distortion theory. We choose a distortion metric that reflects the effect of NN compression on the model output and then derive the tradeoff between rate (compression ratio) and distortion. In addition to characterizing theoretical limits of NN compression, this formulation shows that pruning, implicitly or explicitly, must be a part of a good compression algorithm. This observation bridges a gap between parts of the literature pertaining to NN and data compression, respectively, providing insight into the empirical success of pruning for NN compression. Finally, we propose a novel pruning strategy derived from our information-theoretic formulation and show that it outperforms the relevant baselines on CIFAR-10 and ImageNet datasets. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/isik22a.html
  PDF: https://proceedings.mlr.press/v151/isik22a/isik22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-isik22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Berivan
    family: Isik
  - given: Tsachy
    family: Weissman
  - given: Albert
    family: No
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3821-3846
  id: isik22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3821
  lastpage: 3846
  published: 2022-05-03 00:00:00 +0000
- title: ' Flexible Accuracy for Differential Privacy '
  abstract: ' Differential Privacy (DP) has become a gold standard in privacy-preserving data analysis. While it provides one of the most rigorous notions of privacy, there are many settings where its applicability is limited. Our main contribution is in augmenting differential privacy with Flexible Accuracy, which allows small distortions in the input (e.g., dropping outliers) before measuring accuracy of the output, allowing one to extend DP mechanisms to high-sensitivity functions. We present mechanisms that can help in achieving this notion for functions that had no meaningful differentially private mechanisms previously. In particular, we illustrate an application to differentially private histograms, which in turn yields mechanisms for revealing the support of a dataset or the extremal values in the data. Analyses of our constructions exploit new versatile composition theorems that facilitate modular design. All the above extensions use our new definitional framework, which is in terms of “lossy Wasserstein distance” – a 2-parameter error measure for distributions. This may be of independent interest. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/bansal22a.html
  PDF: https://proceedings.mlr.press/v151/bansal22a/bansal22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-bansal22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Aman
    family: Bansal
  - given: Rahul
    family: Chunduru
  - given: Deepesh
    family: Data
  - given: Manoj
    family: Prabhakaran
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3847-3882
  id: bansal22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3847
  lastpage: 3882
  published: 2022-05-03 00:00:00 +0000
- title: ' Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation '
  abstract: ' We study reinforcement learning in an infinite-horizon average-reward setting with linear function approximation for linear mixture Markov decision processes (MDPs), where the transition probability function of the underlying MDP admits a linear form over a feature mapping of the current state, action, and next state. We propose a new algorithm UCRL2-VTR, which can be seen as an extension of the UCRL2 algorithm with linear function approximation. We show that UCRL2-VTR with Bernstein-type bonus can achieve a regret of $\tilde{O}(d\sqrt{DT})$, where $d$ is the dimension of the feature mapping, $T$ is the horizon, and $D$ is the diameter of the MDP. We also prove a matching lower bound $\tilde{\Omega}(d\sqrt{DT})$, which suggests that the proposed UCRL2-VTR is minimax optimal up to logarithmic factors. To the best of our knowledge, our algorithm is the first nearly minimax optimal RL algorithm with function approximation in the infinite-horizon average-reward setting. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/wu22a.html
  PDF: https://proceedings.mlr.press/v151/wu22a/wu22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-wu22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yue
    family: Wu
  - given: Dongruo
    family: Zhou
  - given: Quanquan
    family: Gu
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3883-3913
  id: wu22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3883
  lastpage: 3913
  published: 2022-05-03 00:00:00 +0000
- title: ' On Facility Location Problem in the Local Differential Privacy Model '
  abstract: ' We study the facility location problem under the constraints imposed by local differential privacy (LDP). Recently, Gupta et al. (2010) and Esencayi et al. (2019) proposed lower and upper bounds for the problem on the central differential privacy (DP) model where a trusted curator first collects all data and processes it. In this paper, we focus on the LDP model, where we protect a client’s participation in the facility location instance. Under the HST metric, we show that there is a non-interactive $\epsilon$-LDP algorithm achieving $O(n^{1/4}/\epsilon^2)$-approximation ratio, where $n$ is the size of the metric. On the negative side, we show a lower bound of $\Omega(n^{1/4}/\sqrt{\epsilon})$ on the approximation ratio for any non-interactive $\epsilon$-LDP algorithm. Thus, our results are tight up to a polynomial factor of $\epsilon$. Moreover, unlike previous results, our results generalize to non-uniform facility costs. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/cohen-addad22a.html
  PDF: https://proceedings.mlr.press/v151/cohen-addad22a/cohen-addad22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-cohen-addad22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Vincent
    family: Cohen-Addad
  - given: Yunus
    family: Esencayi
  - given: Chenglin
    family: Fan
  - given: Marco
    family: Gaboradi
  - given: Shi
    family: Li
  - given: Di
    family: Wang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3914-3929
  id: cohen-addad22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3914
  lastpage: 3929
  published: 2022-05-03 00:00:00 +0000
- title: ' Towards Statistical and Computational Complexities of Polyak Step Size Gradient Descent '
  abstract: ' We study the statistical and computational complexities of the Polyak step size gradient descent algorithm under generalized smoothness and {Ł}ojasiewicz conditions of the population loss function, namely, the limit of the empirical loss function when the sample size goes to infinity, and the stability between the gradients of the empirical and population loss functions, namely, the polynomial growth on the concentration bound between the gradients of sample and population loss functions. We demonstrate that the Polyak step size gradient descent iterates reach a final statistical radius of convergence around the true parameter after logarithmic number of iterations in terms of the sample size. It is computationally cheaper than the polynomial number of iterations on the sample size of the fixed-step size gradient descent algorithm to reach the same final statistical radius when the population loss function is not locally strongly convex. Finally, we illustrate our general theory under three statistical examples: generalized linear model, mixture model, and mixed linear regression model. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ren22a.html
  PDF: https://proceedings.mlr.press/v151/ren22a/ren22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ren22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Tongzheng
    family: Ren
  - given: Fuheng
    family: Cui
  - given: Alexia
    family: Atsidakou
  - given: Sujay
    family: Sanghavi
  - given: Nhat
    family: Ho
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3930-3961
  id: ren22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3930
  lastpage: 3961
  published: 2022-05-03 00:00:00 +0000
- title: ' MLDemon:Deployment Monitoring for Machine Learning Systems '
  abstract: ' Post-deployment monitoring of ML systems is critical for ensuring reliability, especially as new user inputs can differ from the training distribution. Here we propose a novel approach, MLDemon, for ML DEployment MONitoring. MLDemon integrates both unlabeled data and a small amount of on-demand labels to produce a real-time estimate of the ML model’s current performance on a given data stream. Subject to budget constraints, MLDemon decides when to acquire additional, potentially costly, expert supervised labels to verify the model. On temporal datasets with diverse distribution drifts and models, MLDemon outperforms existing approaches. Moreover, we provide theoretical analysis to show that MLDemon is minimax rate optimal for a broad class of distribution drifts. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ginart22a.html
  PDF: https://proceedings.mlr.press/v151/ginart22a/ginart22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ginart22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Tony
    family: Ginart
  - given: Martin
    family: Jinye Zhang
  - given: James
    family: Zou
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3962-3997
  id: ginart22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3962
  lastpage: 3997
  published: 2022-05-03 00:00:00 +0000
- title: ' How to Learn when Data Gradually Reacts to Your Model '
  abstract: ' A recent line of work has focused on training machine learning (ML) models in the performative setting, i.e. when the data distribution reacts to the deployed model. The goal in this setting is to learn a model which both induces a favorable data distribution and performs well on the induced distribution, thereby minimizing the test loss. Previous work on finding an optimal model assumes that the data distribution immediately adapts to the deployed model. In practice, however, this may not be the case, as the population may take time to adapt to the model. In many applications, the data distribution depends on both the currently deployed ML model and on the “state” that the population was in before the model was deployed. In this work, we propose a new algorithm, Stateful Performative Gradient Descent (Stateful PerfGD), for minimizing the performative loss even in the presence of these effects. We provide theoretical guarantees for the convergence of Stateful PerfGD. Our experiments confirm that Stateful PerfGD substantially outperforms previous state-of-the-art methods. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/izzo22a.html
  PDF: https://proceedings.mlr.press/v151/izzo22a/izzo22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-izzo22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Zachary
    family: Izzo
  - given: James
    family: Zou
  - given: Lexing
    family: Ying
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 3998-4035
  id: izzo22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 3998
  lastpage: 4035
  published: 2022-05-03 00:00:00 +0000
- title: ' Mitigating Bias in Calibration Error Estimation '
  abstract: ' For an AI system to be reliable, the confidence it expresses in its decisions must match its accuracy. To assess the degree of match, examples are typically binned by confidence and the per-bin mean confidence and accuracy are compared. Most research in calibration focuses on techniques to reduce this empirical measure of calibration error, ECE_bin. We instead focus on assessing statistical bias in this empirical measure, and we identify better estimators. We propose a framework through which we can compute the bias of a particular estimator for an evaluation data set of a given size. The framework involves synthesizing model outputs that have the same statistics as common neural architectures on popular data sets. We find that binning-based estimators with bins of equal mass (number of instances) have lower bias than estimators with bins of equal width. Our results indicate two reliable calibration-error estimators: the debiased estimator (Brocker, 2012; Ferro and Fricker, 2012) and a method we propose, ECE_sweep, which uses equal-mass bins and chooses the number of bins to be as large as possible while preserving monotonicity in the calibration function. With these estimators, we observe improvements in the effectiveness of recalibration methods and in the detection of model miscalibration. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/roelofs22a.html
  PDF: https://proceedings.mlr.press/v151/roelofs22a/roelofs22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-roelofs22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Rebecca
    family: Roelofs
  - given: Nicholas
    family: Cain
  - given: Jonathon
    family: Shlens
  - given: Michael C.
    family: Mozer
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4036-4054
  id: roelofs22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4036
  lastpage: 4054
  published: 2022-05-03 00:00:00 +0000
- title: ' Privacy Amplification by Subsampling in Time Domain '
  abstract: ' Aggregate time-series data like traffic flow and site occupancy repeatedly sample statistics from a population across time. Such data can be profoundly useful for understanding trends within a given population, but also pose a significant privacy risk, potentially revealing e.g., who spends time where. Producing a private version of a time-series satisfying the standard definition of Differential Privacy (DP) is challenging due to the large influence a single participant can have on the sequence: if an individual can contribute to each time step, the amount of additive noise needed to satisfy privacy increases linearly with the number of time steps sampled. As such, if a signal spans a long duration or is oversampled, an excessive amount of noise must be added, drowning out underlying trends. However, in many applications an individual realistically cannot participate at every time step. When this is the case, we observe that the influence of a single participant (sensitivity) can be reduced by subsampling and/or filtering in time, while still meeting privacy requirements. Using a novel analysis, we show this significant reduction in sensitivity and propose a corresponding class of privacy mechanisms. We demonstrate the utility benefits of these techniques empirically with real-world and synthetic time-series data. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/koga22a.html
  PDF: https://proceedings.mlr.press/v151/koga22a/koga22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-koga22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Tatsuki
    family: Koga
  - given: Casey
    family: Meehan
  - given: Kamalika
    family: Chaudhuri
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4055-4069
  id: koga22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4055
  lastpage: 4069
  published: 2022-05-03 00:00:00 +0000
- title: ' How and When Random Feedback Works: A Case Study of Low-Rank Matrix Factorization '
  abstract: ' The success of gradient descent in ML and especially for learning neural networks is remarkable and robust. In the context of how the brain learns, one aspect of gradient descent that appears biologically difficult to realize (if not implausible) is that its updates rely on feedback from later layers to earlier layers through the same connections. Such bidirected links are relatively few in brain networks, and even when reciprocal connections exist, they may not be equi-weighted. Random Feedback Alignment (Lillicrap et al., 2016), where the backward weights are random and fixed, has been proposed as a bio-plausible alternative and found to be effective empirically. We investigate how and when feedback alignment (FA) works, focusing on one of the most basic problems with layered structure $n\times m$, the goal is to find a low rank factorization $Z_{n \times r}W_{r \times m}$ that minimizes the error $\|ZW-Y\|_F$. Gradient descent solves this problem optimally. We show that FA finds the optimal solution when $r\ge \mbox{rank}(Y)$. We also shed light on how FA works. It is observed empirically that the forward weight matrices and (random) feedback matrices come closer during FA updates. Our analysis rigorously derives this phenomenon and shows how it facilitates convergence of FA*, a closely related variant of FA. We also show that FA can be far from optimal when $r < \mbox{rank}(Y)$. This is the first provable separation result between gradient descent and FA. Moreover, the representations found by gradient descent and FA can be almost orthogonal even when their error $\|ZW-Y\|_F$ is approximately equal. As a corollary, these results also hold for training two-layer linear neural networks when the training input is isotropic, and the output is a linear function of the input. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/garg22a.html
  PDF: https://proceedings.mlr.press/v151/garg22a/garg22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-garg22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Shivam
    family: Garg
  - given: Santosh
    family: Vempala
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4070-4108
  id: garg22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4070
  lastpage: 4108
  published: 2022-05-03 00:00:00 +0000
- title: ' Gap-Dependent Unsupervised Exploration for Reinforcement Learning '
  abstract: ' For the problem of task-agnostic reinforcement learning (RL), an agent first collects samples from an unknown environment without the supervision of reward signals, then is revealed with a reward and is asked to compute a corresponding near-optimal policy. Existing approaches mainly concern the worst-case scenarios, in which no structural information of the reward/transition-dynamics is utilized. Therefore the best sample upper bound is $\propto\widetilde{\mathcal{O}}(1/\epsilon^2)$, where $\epsilon>0$ is the target accuracy of the obtained policy, and can be overly pessimistic. To tackle this issue, we provide an efficient algorithm that utilizes a gap parameter, $\rho>0$, to reduce the amount of exploration. In particular, for an unknown finite-horizon Markov decision process, the algorithm takes only $\widetilde{\mathcal{O}} (1/\epsilon \cdot (H^3SA / \rho + H^4 S^2 A) )$ episodes of exploration, and is able to obtain an $\epsilon$-optimal policy for a post-revealed reward with sub-optimality gap at least $\rho$, where $S$ is the number of states, $A$ is the number of actions, and $H$ is the length of the horizon, obtaining a nearly quadratic saving in terms of $\epsilon$. We show that, information-theoretically, this bound is nearly tight for $\rho < \Theta(1/(HS))$ and $H>1$. We further show that $\propto\widetilde{\mathcal{O}}(1)$ sample bound is possible for $H=1$ (i.e., multi-armed bandit) or with a sampling simulator, establishing a stark separation between those settings and the RL setting. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/wu22b.html
  PDF: https://proceedings.mlr.press/v151/wu22b/wu22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-wu22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jingfeng
    family: Wu
  - given: Vladimir
    family: Braverman
  - given: Lin
    family: Yang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4109-4131
  id: wu22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4109
  lastpage: 4131
  published: 2022-05-03 00:00:00 +0000
- title: ' On the Generalization of Representations in Reinforcement Learning '
  abstract: ' In reinforcement learning, state representations are used to tractably deal with large problem spaces. State representations serve both to approximate the value function with few parameters, but also to generalize to newly encountered states. Their features may be learned implicitly (as part of a neural network) or explicitly (for example, the successor representation of Dayan(1993). While the approximation properties of representations are reasonably well-understood, a precise characterization of how and when these representations generalize is lacking. In this work, we address this gap and provide an informative bound on the generalization error arising from a specific state representation. This bound is based on the notion of effective dimension which measures the degree to which knowing the value at one state informs the value at other states. Our bound applies to any state representation and quantifies the natural tension between representations that generalize well and those that approximate well. We complement our theoretical results with an empirical survey of classic representation learning methods from the literature and results on the Arcade Learning Environment, and find that the generalization behaviour of learned representations is well-explained by their effective dimension. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/le-lan22a.html
  PDF: https://proceedings.mlr.press/v151/le-lan22a/le-lan22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-le-lan22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Charline
    family: Le Lan
  - given: Stephen
    family: Tu
  - given: Adam
    family: Oberman
  - given: Rishabh
    family: Agarwal
  - given: Marc G.
    family: Bellemare
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4132-4157
  id: le-lan22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4132
  lastpage: 4157
  published: 2022-05-03 00:00:00 +0000
- title: ' Identifiable Energy-based Representations: An Application to Estimating Heterogeneous Causal Effects '
  abstract: ' Conditional average treatment effects (CATEs) allow us to understand the effect heterogeneity across a large population of individuals. However, typical CATE learners assume all confounding variables are measured in order for the CATE to be identifiable. This requirement can be satisfied by collecting many variables, at the expense of increased sample complexity for estimating CATEs. To combat this, we propose an energy-based model (EBM) that learns a low-dimensional representation of the variables by employing a noise contrastive loss function. With our EBM we introduce a preprocessing step that alleviates the dimensionality curse for any existing learner developed for estimating CATEs. We prove that our EBM keeps the representations partially identifiable up to some universal constant, as well as having universal approximation capability. These properties enable the representations to converge and keep the CATE estimates consistent. Experiments demonstrate the convergence of the representations, as well as show that estimating CATEs on our representations performs better than on the variables or the representations obtained through other dimensionality reduction methods. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/zhang22b.html
  PDF: https://proceedings.mlr.press/v151/zhang22b/zhang22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-zhang22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yao
    family: Zhang
  - given: Jeroen
    family: Berrevoets
  - given: Mihaela
    family: Van Der Schaar
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4158-4177
  id: zhang22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4158
  lastpage: 4177
  published: 2022-05-03 00:00:00 +0000
- title: ' Learning a Single Neuron for Non-monotonic Activation Functions '
  abstract: ' We study the problem of learning a single neuron $\mathbf{x}\mapsto \sigma(\mathbf{w}^T\mathbf{x})$ with gradient descent (GD). All the existing positive results are limited to the case where $\sigma$ is monotonic. However, it is recently observed that non-monotonic activation functions outperform the traditional monotonic ones in many applications. To fill this gap, we establish learnability without assuming monotonicity. Specifically, when the input distribution is the standard Gaussian, we show that mild conditions on $\sigma$ (e.g., $\sigma$ has a dominating linear part) are sufficient to guarantee the learnability in polynomial time and polynomial samples. Moreover, with a stronger assumption on the activation function, the condition of input distribution can be relaxed to a non-degeneracy of the marginal distribution. We remark that our conditions on $\sigma$ are satisfied by practical non-monotonic activation functions, such as SiLU/Swish and GELU. We also discuss how our positive results are related to existing negative results on training two-layer neural networks. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/wu22c.html
  PDF: https://proceedings.mlr.press/v151/wu22c/wu22c.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-wu22c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Lei
    family: Wu
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4178-4197
  id: wu22c
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4178
  lastpage: 4197
  published: 2022-05-03 00:00:00 +0000
- title: ' Can Pretext-Based Self-Supervised Learning Be Boosted by Downstream Data? A Theoretical Analysis '
  abstract: ' Pretext-based self-supervised learning learns the semantic representation via a handcrafted pretext task over unlabeled data and then uses the learned representation for downstream tasks, which effectively reduces the sample complexity of downstream tasks under Conditional Independence (CI) condition. However, the downstream sample complexity gets much worse if the CI condition does not hold. One interesting question is whether we can make the CI condition hold by using downstream data to refine the unlabeled data to boost self-supervised learning. At first glance, one might think that seeing downstream data in advance would always boost the downstream performance. However, we show that it is not intuitively true and point out that in some cases, it hurts the final performance instead. In particular, we prove both model-free and model-dependent lower bounds of the number of downstream samples used for data refinement. Moreover, we conduct various experiments on both synthetic and real-world datasets to verify our theoretical results. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/teng22a.html
  PDF: https://proceedings.mlr.press/v151/teng22a/teng22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-teng22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jiaye
    family: Teng
  - given: Weiran
    family: Huang
  - given: Haowei
    family: He
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4198-4216
  id: teng22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4198
  lastpage: 4216
  published: 2022-05-03 00:00:00 +0000
- title: ' Model-free Policy Learning with Reward Gradients '
  abstract: ' Despite the increasing popularity of policy gradient methods, they are yet to be widely utilized in sample-scarce applications, such as robotics. The sample efficiency could be improved by making best usage of available information. As a key component in reinforcement learning, the reward function is usually devised carefully to guide the agent. Hence, the reward function is usually known, allowing access to not only scalar reward signals but also reward gradients. To benefit from reward gradients, previous works require the knowledge of environment dynamics, which are hard to obtain. In this work, we develop the Reward Policy Gradient estimator, a novel approach that integrates reward gradients without learning a model. Bypassing the model dynamics allows our estimator to achieve a better bias-variance trade-off, which results in a higher sample efficiency, as shown in the empirical analysis. Our method also boosts the performance of Proximal Policy Optimization on different MuJoCo control tasks. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/lan22a.html
  PDF: https://proceedings.mlr.press/v151/lan22a/lan22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-lan22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Qingfeng
    family: Lan
  - given: Samuele
    family: Tosatto
  - given: Homayoon
    family: Farrahi
  - given: Rupam
    family: Mahmood
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4217-4234
  id: lan22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4217
  lastpage: 4234
  published: 2022-05-03 00:00:00 +0000
- title: ' Preference Exploration for Efficient Bayesian Optimization with Multiple Outcomes '
  abstract: ' We consider Bayesian optimization of expensive-to-evaluate experiments that generate vector-valued outcomes over which a decision-maker (DM) has preferences. These preferences are encoded by a utility function that is not known in closed form but can be estimated by asking the DM to express preferences over pairs of outcome vectors. To address this problem, we develop Bayesian optimization with preference exploration, a novel framework that alternates between interactive real-time preference learning with the DM via pairwise comparisons between outcomes, and Bayesian optimization with a learned compositional model of DM utility and outcomes. Within this framework, we propose preference exploration strategies specifically designed for this task, and demonstrate their performance via extensive simulation studies. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/jerry-lin22a.html
  PDF: https://proceedings.mlr.press/v151/jerry-lin22a/jerry-lin22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-jerry-lin22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Zhiyuan
    family: Jerry Lin
  - given: Raul
    family: Astudillo
  - given: Peter
    family: Frazier
  - given: Eytan
    family: Bakshy
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4235-4258
  id: jerry-lin22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4235
  lastpage: 4258
  published: 2022-05-03 00:00:00 +0000
- title: ' Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs '
  abstract: ' Learning Markov decision processes (MDPs) in the presence of the adversary is a challenging problem in reinforcement learning (RL). In this paper, we study RL in episodic MDPs with adversarial reward and full information feedback, where the unknown transition probability function is a linear function of a given feature mapping, and the reward function can change arbitrarily episode by episode. We propose an optimistic policy optimization algorithm POWERS and show that it can achieve $\tilde{O}(dH\sqrt{T})$ regret, where $H$ is the length of the episode, $T$ is the number of interaction with the MDP, and $d$ is the dimension of the feature mapping. Furthermore, we also prove a matching lower bound of $\tilde{\Omega}(dH\sqrt{T})$ up to logarithmic factors. Our key technical contributions are two-fold: (1) a new value function estimator based on importance weighting; and (2) a tighter confidence set for the transition kernel. They together lead to the nearly minimax optimal regret. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/he22a.html
  PDF: https://proceedings.mlr.press/v151/he22a/he22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-he22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jiafan
    family: He
  - given: Dongruo
    family: Zhou
  - given: Quanquan
    family: Gu
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4259-4280
  id: he22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4259
  lastpage: 4280
  published: 2022-05-03 00:00:00 +0000
- title: ' Lifted Primal-Dual Method for Bilinearly Coupled Smooth Minimax Optimization '
  abstract: ' We study the bilinearly coupled minimax problem: $\min_{x} \max_{y} f(x) + y^\top A x - h(y)$, where $f$ and $h$ are both strongly convex smooth functions and admit first-order gradient oracles. Surprisingly, no known first-order algorithms have hitherto achieved the lower complexity bound of $\Omega((\sqrt{\frac{L_x}{\mu_x}} + \frac{\|A\|}{\sqrt{\mu_x \mu_y}} + \sqrt{\frac{L_y}{\mu_y}}) \log(\frac1{\varepsilon}))$ for solving this problem up to an $\varepsilon$ primal-dual gap in the general parameter regime, where $L_x, L_y,\mu_x,\mu_y$ are the corresponding smoothness and strongly convexity constants. We close this gap by devising the first optimal algorithm, the Lifted Primal-Dual (LPD) method. Our method lifts the objective into an extended form that allows both the smooth terms and the bilinear term to be handled optimally and seamlessly with the same primal-dual framework. Besides optimality, our method yields a desirably simple single-loop algorithm that uses only one gradient oracle call per iteration. Moreover, when $f$ is just convex, the same algorithm applied to a smoothed objective achieves the nearly optimal iteration complexity. We also provide a direct single-loop algorithm, using the LPD method, that achieves the iteration complexity of $O(\sqrt{\frac{L_x}{\varepsilon}} + \frac{\|A\|}{\sqrt{\mu_y \varepsilon}} + \sqrt{\frac{L_y}{\varepsilon}})$. Numerical experiments on quadratic minimax problems and policy evaluation problems further demonstrate the fast convergence of our algorithm in practice. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/thekumparampil22a.html
  PDF: https://proceedings.mlr.press/v151/thekumparampil22a/thekumparampil22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-thekumparampil22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Kiran K.
    family: Thekumparampil
  - given: Niao
    family: He
  - given: Sewoong
    family: Oh
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4281-4308
  id: thekumparampil22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4281
  lastpage: 4308
  published: 2022-05-03 00:00:00 +0000
- title: ' Denoising and change point localisation in piecewise-constant high-dimensional regression coefficients '
  abstract: ' We study the theoretical properties of the fused lasso procedure originally proposed by Tibshirani et al. (2005) in the context of a linear regression model in which the regression coefficient are totally ordered and assumed to be sparse and piecewise constant. Despite its popularity, to the best of our knowledge, estimation error bounds in high-dimensional settings have only been obtained for the simple case in which the design matrix is the identity matrix. We formulate a novel restricted isometry condition on the design matrix that is tailored to the fused lasso estimator and derive estimation bounds for both the constrained version of the fused lasso assuming dense coefficients and for its penalised version. We observe that the estimation error can be dominated by either the lasso or the fused lasso rate, depending on whether the number of non-zero coefficient is larger than the number of piece-wise constant segments. Finally, we devise a post-processing procedure to recover the piecewise-constant pattern of the coefficients. Extensive numerical experiments support our theoretical findings. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/wang22c.html
  PDF: https://proceedings.mlr.press/v151/wang22c/wang22c.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-wang22c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Fan
    family: Wang
  - given: Oscar
    family: Madrid
  - given: Yi
    family: Yu
  - given: Alessandro
    family: Rinaldo
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4309-4338
  id: wang22c
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4309
  lastpage: 4338
  published: 2022-05-03 00:00:00 +0000
- title: ' Optimal partition recovery in general graphs '
  abstract: ' We consider a graph-structured change point problem in which we observe a random vector with piece-wise constant but otherwise unknown mean and whose independent, sub-Gaussian coordinates correspond to the $n$ nodes of a fixed graph. We are interested in the localisation task of recovering the partition of the nodes associated to the constancy regions of the mean vector or, equivalently, of estimating the cut separating the sub-graphs over which the mean remains constant. Although graph-valued signals of this type have been previously studied in the literature for the different tasks of testing for the presence of an anomalous cluster and of estimating the mean vector, no localisation results are known outside the classical case of chain graphs. When the partition $\mathcal{S}$ consists of only two elements, we characterise the difficulty of the localisation problem in terms of four key parameters: the maximal noise variance $\sigma^2$, the size $\Delta$ of the smaller element of the partition, the magnitude $\kappa$ of the difference in the signal values across contiguous elements of the partition and the sum of the effective resistance edge weights $|\partial_r(\mathcal{S})|$ of the corresponding cut – a graph theoretic quantity quantifying the size of the partition boundary. In particular, we demonstrate an information theoretical lower bound implying that, in the low signal-to-noise ratio regime $\kappa^2 \Delta \sigma^{-2} |\partial_r(\mathcal{S})|^{-1} \lesssim 1$, no consistent estimator of the true partition exists. On the other hand, when $\kappa^2 \Delta \sigma^{-2} |\partial_r(\mathcal{S})|^{-1} \gtrsim \zeta_n \log\{r(|E|)\}$, with $r(|E|)$ being the sum of effective resistance weighted edges and $\zeta_n$ being any diverging sequence in $n$, we show that a polynomial-time, approximate $\ell_0$-penalised least squared estimator delivers a localisation error – measured by the symmetric difference between the true and estimated partition – of order $ \kappa^{-2} \sigma^2 |\partial_r(\mathcal{S})| \log\{r(|E|)\}$. Aside from the $\log\{r(|E|)\}$ term, this rate is minimax optimal. Finally, we provide discussions on the localisation error for more general partitions of unknown sizes. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/yu22b.html
  PDF: https://proceedings.mlr.press/v151/yu22b/yu22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-yu22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yi
    family: Yu
  - given: Oscar
    family: Madrid
  - given: Alessandro
    family: Rinaldo
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4339-4358
  id: yu22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4339
  lastpage: 4358
  published: 2022-05-03 00:00:00 +0000
- title: ' On Uncertainty Estimation by Tree-based Surrogate Models in Sequential Model-based Optimization '
  abstract: ' Sequential model-based optimization sequentially selects a candidate point by constructing a surrogate model with the history of evaluations, to solve a black-box optimization problem. Gaussian process (GP) regression is a popular choice as a surrogate model, because of its capability of calculating prediction uncertainty analytically. On the other hand, an ensemble of randomized trees is another option and has practical merits over GPs due to its scalability and easiness of handling continuous/discrete mixed variables. In this paper we revisit various ensembles of randomized trees to investigate their behavior in the perspective of prediction uncertainty estimation. Then, we propose a new way of constructing an ensemble of randomized trees, referred to as BwO forest, where bagging with oversampling is employed to construct bootstrapped samples that are used to build randomized trees with random splitting. Experimental results demonstrate the validity and good performance of BwO forest over existing tree-based models in various circumstances. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/kim22b.html
  PDF: https://proceedings.mlr.press/v151/kim22b/kim22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-kim22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jungtaek
    family: Kim
  - given: Seungjin
    family: Choi
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4359-4375
  id: kim22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4359
  lastpage: 4375
  published: 2022-05-03 00:00:00 +0000
- title: ' Offline Policy Selection under Uncertainty '
  abstract: ' The presence of uncertainty in policy evaluation significantly complicates the process of policy ranking and selection in real-world settings. We formally consider offline policy selection as learning preferences over a set of policy prospects given a fixed experience dataset. While one can select or rank policies based on point estimates of their expected values or high-confidence intervals, access to the full distribution over one’s belief of the policy value enables more flexible selection algorithms under a wider range of downstream evaluation metrics. We propose a Bayesian approach for estimating this belief distribution in terms of posteriors of distribution correction ratios derived from stochastic constraints. Empirically, despite being Bayesian, the credible intervals obtained are competitive with state-of-the-art frequentist approaches in confidence interval estimation. More importantly, we show how the belief distribution may be used to rank policies with respect to arbitrary downstream policy selection metrics, and empirically demonstrate that this selection procedure significantly outperforms existing approaches, such as ranking policies according to mean or high-confidence lower bound value estimates. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/yang22a.html
  PDF: https://proceedings.mlr.press/v151/yang22a/yang22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-yang22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Mengjiao
    family: Yang
  - given: Bo
    family: Dai
  - given: Ofir
    family: Nachum
  - given: George
    family: Tucker
  - given: Dale
    family: Schuurmans
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4376-4396
  id: yang22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4376
  lastpage: 4396
  published: 2022-05-03 00:00:00 +0000
- title: ' On Multimarginal Partial Optimal Transport: Equivalent Forms and Computational Complexity '
  abstract: ' We study the multi-marginal partial optimal transport (POT) problem between $m$ discrete (unbalanced) measures with at most $n$ supports. We first prove that we can obtain two equivalent forms of the multimarginal POT problem in terms of the multimarginal optimal transport problem via novel extensions of cost tensors. The first equivalent form is derived under the assumptions that the total masses of each measure are sufficiently close while the second equivalent form does not require any conditions on these masses but at the price of more sophisticated extended cost tensor. Our proof techniques for obtaining these equivalent forms rely on novel procedures of moving mass in graph theory to push transportation plan into appropriate regions. Finally, based on the equivalent forms, we develop an optimization algorithm, named the ApproxMPOT algorithm, that builds upon the Sinkhorn algorithm for solving the entropic regularized multimarginal optimal transport. We demonstrate that the ApproxMPOT algorithm can approximate the optimal value of multimarginal POT problem with a computational complexity upper bound of the order $\bigOtil(m^3(n+1)^{m}/ \varepsilon^2)$ where $\varepsilon > 0$ stands for the desired tolerance. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/le22a.html
  PDF: https://proceedings.mlr.press/v151/le22a/le22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-le22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Khang
    family: Le
  - given: Huy
    family: Nguyen
  - given: Khai
    family: Nguyen
  - given: Tung
    family: Pham
  - given: Nhat
    family: Ho
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4397-4413
  id: le22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4397
  lastpage: 4413
  published: 2022-05-03 00:00:00 +0000
- title: ' Independent Natural Policy Gradient always converges in Markov Potential Games '
  abstract: ' Natural policy gradient has emerged as one of the most successful algorithms for computing optimal policies in challenging Reinforcement Learning (RL) tasks, yet, very little was known about its convergence properties until recently. The picture becomes more blurry when it comes to multi-agent RL (MARL); the line of works that have theoretical guarantees for convergence to Nash policies are very limited. In this paper, we focus on a particular class of multi-agent stochastic games called Markov Potential Games and we prove that Independent Natural Policy Gradient always converges using constant learning rates. The proof deviates from the existing approaches and the main challenge lies in the fact that Markov potential Games do not have unique optimal values (as single-agent settings exhibit) so different initializations can lead to different limit point values. We complement our theoretical results with experiments that indicate that Natural Policy Gradient outperforms Policy Gradient in MARL settings (our process benchmark is multi-state congestion games). '
  volume: 151
  URL: https://proceedings.mlr.press/v151/fox22a.html
  PDF: https://proceedings.mlr.press/v151/fox22a/fox22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-fox22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Roy
    family: Fox
  - given: Stephen M.
    family: Mcaleer
  - given: Will
    family: Overman
  - given: Ioannis
    family: Panageas
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4414-4425
  id: fox22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4414
  lastpage: 4425
  published: 2022-05-03 00:00:00 +0000
- title: ' Semi-Implicit Hybrid Gradient Methods with Application to Adversarial Robustness '
  abstract: ' Adversarial examples, crafted by adding imperceptible perturbations to natural inputs, can easily fool deep neural networks (DNNs). One of the most successful methods for training adversarially robust DNNs is solving a nonconvex-nonconcave minimax problem with an adversarial training (AT) algorithm. However, among the many AT algorithms, only Dynamic AT (DAT) and You Only Propagate Once (YOPO) is guaranteed to converge to a stationary point with rate O(1/K^{1/2}). In this work, we generalize the stochastic primal-dual hybrid gradient algorithm to develop semi-implicit hybrid gradient methods (SI-HGs) for finding stationary points of nonconvex-nonconcave minimax problems. SI-HGs have the convergence rate O(1/K), which improves upon the rate O(1/K^{1/2}) of DAT and YOPO. We devise a practical variant of SI-HGs, and show that it outperforms other AT algorithms in terms of convergence speed and robustness. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/kim22c.html
  PDF: https://proceedings.mlr.press/v151/kim22c/kim22c.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-kim22c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Beomsu
    family: Kim
  - given: Junghoon
    family: Seo
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4426-4445
  id: kim22c
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4426
  lastpage: 4445
  published: 2022-05-03 00:00:00 +0000
- title: ' Projection Predictive Inference for Generalized Linear and Additive Multilevel Models '
  abstract: ' Projection predictive inference is a decision theoretic Bayesian approach that decouples model estimation from decision making. Given a reference model previously built including all variables present in the data, projection predictive inference projects its posterior onto a constrained space of a subset of variables. Variable selection is then performed by sequentially adding relevant variables until predictive performance is satisfactory. Previously, projection predictive inference has been demonstrated only for generalized linear models (GLMs) and Gaussian processes (GPs) where it showed superior performance to competing variable selection procedures. In this work, we extend projection predictive inference to support variable and structure selection for generalized linear multilevel models (GLMMs) and generalized additive multilevel models (GAMMs). Our simulative and real-world experiments demonstrate that our method can drastically reduce the model complexity required to reach reference predictive performance and achieve good frequency properties. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/catalina22a.html
  PDF: https://proceedings.mlr.press/v151/catalina22a/catalina22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-catalina22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Alejandro
    family: Catalina
  - given: Paul-Christian
    family: Bürkner
  - given: Aki
    family: Vehtari
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4446-4461
  id: catalina22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4446
  lastpage: 4461
  published: 2022-05-03 00:00:00 +0000
- title: ' Point Cloud Generation with Continuous Conditioning '
  abstract: ' Generative models can be used to synthesize 3D objects of high quality and diversity. However, there is typically no control over the properties of the generated object.This paper proposes a novel generative adversarial network (GAN) setup that generates 3D point cloud shapes conditioned on a continuous parameter. In an exemplary application, we use this to guide the generative process to create a 3D object with a custom-fit shape. We formulate this generation process in a multi-task setting by using the concept of auxiliary classifier GANs. Further, we propose to sample the generator label input for training from a kernel density estimation (KDE) of the dataset. Our ablations show that this leads to significant performance increase in regions with few samples. Extensive quantitative and qualitative experiments show that we gain explicit control over the object dimensions while maintaining good generation quality and diversity. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/triess22a.html
  PDF: https://proceedings.mlr.press/v151/triess22a/triess22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-triess22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Larissa T.
    family: Triess
  - given: Andre
    family: Bühler
  - given: David
    family: Peter
  - given: Fabian B.
    family: Flohr
  - given: Marius
    family: Zöllner
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4462-4481
  id: triess22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4462
  lastpage: 4481
  published: 2022-05-03 00:00:00 +0000
- title: ' An Optimal Algorithm for Strongly Convex Minimization under Affine Constraints '
  abstract: ' Optimization problems under affine constraints appear in various areas of machine learning. We consider the task of minimizing a smooth strongly convex function F(x) under the affine constraint Kx = b, with an oracle providing evaluations of the gradient of F and multiplications by K and its transpose. We provide lower bounds on the number of gradient computations and matrix multiplications to achieve a given accuracy. Then we propose an accelerated primal-dual algorithm achieving these lower bounds. Our algorithm is the first optimal algorithm for this class of problems. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/salim22a.html
  PDF: https://proceedings.mlr.press/v151/salim22a/salim22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-salim22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Adil
    family: Salim
  - given: Laurent
    family: Condat
  - given: Dmitry
    family: Kovalev
  - given: Peter
    family: Richtarik
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4482-4498
  id: salim22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4482
  lastpage: 4498
  published: 2022-05-03 00:00:00 +0000
- title: ' CF-GNNExplainer: Counterfactual Explanations for Graph Neural Networks '
  abstract: ' Given the increasing promise of graph neural networks (GNNs) in real-world applications, several methods have been developed for explaining their predictions. Existing methods for interpreting predictions from GNNs have primarily focused on generating subgraphs that are especially relevant for a particular prediction. However, such methods are not counterfactual (CF) in nature: given a prediction, we want to understand how the prediction can be changed in order to achieve an alternative outcome. In this work, we propose a method for generating CF explanations for GNNs: the minimal perturbation to the input (graph) data such that the prediction changes. Using only edge deletions, we find that our method, CF-GNNExplainer, can generate CF explanations for the majority of instances across three widely used datasets for GNN explanations, while removing less than 3 edges on average, with at least $94%$ accuracy. This indicates that CF-GNNExplainer primarily removes edges that are crucial for the original predictions, resulting in minimal CF explanations. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/lucic22a.html
  PDF: https://proceedings.mlr.press/v151/lucic22a/lucic22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-lucic22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ana
    family: Lucic
  - given: Maartje A.
    family: Ter Hoeve
  - given: Gabriele
    family: Tolomei
  - given: Maarten
    family: De Rijke
  - given: Fabrizio
    family: Silvestri
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4499-4511
  id: lucic22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4499
  lastpage: 4511
  published: 2022-05-03 00:00:00 +0000
- title: ' Safe Active Learning for Multi-Output Gaussian Processes '
  abstract: ' Multi-output regression problems are commonly encountered in science and engineering. In particular, multi-output Gaussian processes have been emerged as a promising tool for modeling these complex systems since they can exploit the inherent correlations and provide reliable uncertainty estimates. In many applications, however, acquiring the data is expensive and safety concerns might arise (e.g. robotics, engineering). We propose a safe active learning approach for multi-output Gaussian process regression. This approach queries the most informative data or output taking the relatedness between the regressors and safety constraints into account. We prove the effectiveness of our approach by providing theoretical analysis and by demonstrating empirical results on simulated datasets and on a real-world engineering dataset. On all datasets, our approach shows improved convergence compared to its competitors. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/li22d.html
  PDF: https://proceedings.mlr.press/v151/li22d/li22d.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-li22d.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Cen-You
    family: Li
  - given: Barbara
    family: Rakitsch
  - given: Christoph
    family: Zimmer
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4512-4551
  id: li22d
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4512
  lastpage: 4551
  published: 2022-05-03 00:00:00 +0000
- title: ' Variational Continual Proxy-Anchor for Deep Metric Learning '
  abstract: ' The recent proxy-anchor method achieved outstanding performance in deep metric learning, which can be acknowledged to its data efficient loss based on hard example mining, as well as far lower sampling complexity than pair-based approaches. In this paper we extend the proxy-anchor method by posing it within the continual learning framework, motivated from its batch-expected loss form (instead of instance-expected, typical in deep learning), which can potentially incur the catastrophic forgetting of historic batches. By regarding each batch as a task in continual learning, we adopt the Bayesian variational continual learning approach to derive a novel loss function. Interestingly the resulting loss has two key modifications to the original proxy-anchor loss: i) we inject noise to the proxies when optimizing the proxy-anchor loss, and ii) we encourage momentum update to avoid abrupt model changes. As a result, the learned model achieves higher test accuracy than proxy-anchor due to the robustness to noise in data (through model perturbation during training), and the reduced batch forgetting effect. We demonstrate the improved results on several benchmark datasets. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/kim22d.html
  PDF: https://proceedings.mlr.press/v151/kim22d/kim22d.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-kim22d.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Minyoung
    family: Kim
  - given: Ricardo
    family: Guerrero
  - given: Hai X.
    family: Pham
  - given: Vladimir
    family: Pavlovic
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4552-4573
  id: kim22d
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4552
  lastpage: 4573
  published: 2022-05-03 00:00:00 +0000
- title: ' Exploring Counterfactual Explanations Through the Lens of Adversarial Examples: A Theoretical and Empirical Analysis '
  abstract: ' As machine learning (ML) models becomemore widely deployed in high-stakes applications, counterfactual explanations have emerged as key tools for providing actionable model explanations in practice. Despite the growing popularity of counterfactual explanations, the theoretical understanding of these explanations is still lacking behind. In this work, we systematically analyze counterfactual explanations through the lens of adversarial examples. We do so by formalizing the similarities between popular counterfactual explanation and adversarial example generation methods identifying conditions when they are equivalent. We then derive upper bounds between the solutions output by counterfactual explanation and adversarial example generation methods, which we validate on several real world data sets. By establishing these theoretical and empirical similarities between counterfactual explanations and adversarial examples, our work raises fundamental questions about the design and development of existing counterfactual explanation algorithms. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/pawelczyk22a.html
  PDF: https://proceedings.mlr.press/v151/pawelczyk22a/pawelczyk22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-pawelczyk22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Martin
    family: Pawelczyk
  - given: Chirag
    family: Agarwal
  - given: Shalmali
    family: Joshi
  - given: Sohini
    family: Upadhyay
  - given: Himabindu
    family: Lakkaraju
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4574-4594
  id: pawelczyk22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4574
  lastpage: 4594
  published: 2022-05-03 00:00:00 +0000
- title: ' Variational Autoencoders: A Harmonic Perspective '
  abstract: ' In this work we study Variational Autoencoders (VAEs) from the perspective of harmonic analysis. By viewing a VAE’s latent space as a Gaussian Space, a variety of measure space, we derive a series of results that show that the encoder variance of a VAE controls the frequency content of the functions parameterised by the VAE encoder and decoder neural networks. In particular we demonstrate that larger encoder variances reduce the high frequency content of these functions. Our analysis allows us to show that increasing this variance effectively induces a soft Lipschitz constraint on the decoder network of a VAE, which is a core contributor to the adversarial robustness of VAEs. We further demonstrate that adding Gaussian noise to the input of a VAE allows us to more finely control the frequency content and the Lipschitz constant of the VAE encoder networks. Finally, we show that the KL term of the VAE loss serves as single point of action for modulating the frequency content of both encoder and decoder networks; whereby upweighting this term decreases the high-frequency content of both networks. To support our theoretical analysis we run experiments using VAEs with small fully-connected neural networks and with larger convolutional networks, demonstrating empirically that our theory holds for a variety of neural network architectures. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/camuto22a.html
  PDF: https://proceedings.mlr.press/v151/camuto22a/camuto22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-camuto22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Alexander
    family: Camuto
  - given: Matthew
    family: Willetts
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4595-4611
  id: camuto22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4595
  lastpage: 4611
  published: 2022-05-03 00:00:00 +0000
- title: ' On the Consistency of Max-Margin Losses '
  abstract: ' The foundational concept of Max-Margin in machine learning is ill-posed for output spaces with more than two labels such as in structured prediction. In this paper, we show that the Max-Margin loss can only be consistent to the classification task under highly restrictive assumptions on the discrete loss measuring the error between outputs. These conditions are satisfied by distances defined in tree graphs, for which we prove consistency, thus being the first losses shown to be consistent for Max-Margin beyond the binary setting. We finally address these limitations by correcting the concept of Max-Margin and introducing the Restricted-Max-Margin, where the maximization of the loss-augmented scores is maintained, but performed over a subset of the original domain. The resulting loss is also a generalization of the binary support vector machine and it is consistent under milder conditions on the discrete loss. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/nowak22a.html
  PDF: https://proceedings.mlr.press/v151/nowak22a/nowak22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-nowak22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Alex
    family: Nowak
  - given: Alessandro
    family: Rudi
  - given: Francis
    family: Bach
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4612-4633
  id: nowak22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4612
  lastpage: 4633
  published: 2022-05-03 00:00:00 +0000
- title: ' A prior-based approximate latent Riemannian metric '
  abstract: ' Stochastic generative models enable us to capture the geometric structure of a data manifold lying in a high dimensional space through a Riemannian metric in the latent space. However, its practical use is rather limited mainly due to inevitable functionality problems and computational complexity. In this work we propose a surrogate conformal Riemannian metric in the latent space of a generative model that is simple, efficient and robust. This metric is based on a learnable prior that we propose to learn using a basic energy-based model. We theoretically analyze the behavior of the proposed metric and show that it is sensible to use in practice. We demonstrate experimentally the efficiency and robustness, as well as the behavior of the new approximate metric. Also, we show the applicability of the proposed methodology for data analysis in the life sciences. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/arvanitidis22a.html
  PDF: https://proceedings.mlr.press/v151/arvanitidis22a/arvanitidis22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-arvanitidis22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Georgios
    family: Arvanitidis
  - given: Bogdan M.
    family: Georgiev
  - given: Bernhard
    family: Schölkopf
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4634-4658
  id: arvanitidis22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4634
  lastpage: 4658
  published: 2022-05-03 00:00:00 +0000
- title: ' On Convergence of Lookahead in Smooth Games '
  abstract: ' A key challenge in smooth games is that there is no general guarantee for gradient methods to converge to an equilibrium. Recently, Chavdarova et al. (2021) reported a promising empirical observation that Lookahead (Zhang et al., 2019) significantly improves GAN training. While promising, few theoretical guarantees has been studied for Lookahead in smooth games. In this work, we establish the first convergence guarantees of Lookahead for smooth games. We present a spectral analysis and provide a geometric explanation of how and when it actually improves the convergence around a stationary point. Based on the analysis, we derive sufficient conditions for Lookahead to stabilize or accelerate the local convergence in smooth games. Our study reveals that Lookahead provides a general mechanism for stabilization and acceleration in smooth games. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ha22a.html
  PDF: https://proceedings.mlr.press/v151/ha22a/ha22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ha22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Junsoo
    family: Ha
  - given: Gunhee
    family: Kim
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4659-4684
  id: ha22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4659
  lastpage: 4684
  published: 2022-05-03 00:00:00 +0000
- title: ' Learning Proposals for Practical Energy-Based Regression '
  abstract: ' Energy-based models (EBMs) have experienced a resurgence within machine learning in recent years, including as a promising alternative for probabilistic regression. However, energy-based regression requires a proposal distribution to be manually designed for training, and an initial estimate has to be provided at test-time. We address both of these issues by introducing a conceptually simple method to automatically learn an effective proposal distribution, which is parameterized by a separate network head. To this end, we derive a surprising result, leading to a unified training objective that jointly minimizes the KL divergence from the proposal to the EBM, and the negative log-likelihood of the EBM. At test-time, we can then employ importance sampling with the trained proposal to efficiently evaluate the learned EBM and produce stand-alone predictions. Furthermore, we utilize our derived training objective to learn mixture density networks (MDNs) with a jointly trained energy-based teacher, consistently outperforming conventional MDN training on four real-world regression tasks within computer vision. Code is available at https://github.com/fregu856/ebms_proposals. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/gustafsson22a.html
  PDF: https://proceedings.mlr.press/v151/gustafsson22a/gustafsson22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-gustafsson22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Fredrik K.
    family: Gustafsson
  - given: Martin
    family: Danelljan
  - given: Thomas B.
    family: Schön
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4685-4704
  id: gustafsson22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4685
  lastpage: 4704
  published: 2022-05-03 00:00:00 +0000
- title: ' Predicting the impact of treatments over time with uncertainty aware neural differential equations. '
  abstract: ' Predicting the impact of treatments from ob- servational data only still represents a major challenge despite recent significant advances in time series modeling. Treatment assignments are usually correlated with the predictors of the response, resulting in a lack of data support for counterfactual predictions and therefore in poor quality estimates. Developments in causal inference have lead to methods addressing this confounding by requiring a minimum level of overlap. However, overlap is difficult to assess and usually not satisfied in practice. In this work, we propose Counterfactual ODE (CF-ODE), a novel method to predict the impact of treatments continuously over time using Neural Ordinary Differential Equations equipped with uncertainty estimates. This allows to specifically assess which treatment outcomes can be reliably predicted. We demonstrate over several longitudinal datasets that CF-ODE provides more accurate predictions and more reliable uncertainty estimates than previously available methods. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/de-brouwer22a.html
  PDF: https://proceedings.mlr.press/v151/de-brouwer22a/de-brouwer22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-de-brouwer22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Edward
    family: De Brouwer
  - given: Javier
    family: Gonzalez
  - given: Stephanie
    family: Hyland
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4705-4722
  id: de-brouwer22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4705
  lastpage: 4722
  published: 2022-05-03 00:00:00 +0000
- title: ' Improved Algorithms for Misspecified Linear Markov Decision Processes '
  abstract: ' For the misspecified linear Markov decision process (MLMDP) model of Jin et al. [2020], we propose an algorithm with three desirable properties. (P1) Its regret after K episodes scales as Kmax{\ensuremath{\varepsilon}mis,\ensuremath{\varepsilon}tol}, where \ensuremath{\varepsilon}mis is the degree of misspecification and \ensuremath{\varepsilon}tol is a user-specified error tolerance. (P2) Its space and per-episode time complexities remain bounded as $K\rightarrow\infty$. (P3) It does not require \ensuremath{\varepsilon}mis as input. To our knowledge, this is the first algorithm satisfying all three properties. For concrete choices of \ensuremath{\varepsilon}tol, we also improve existing regret bounds (up to log factors) while achieving either (P2) or (P3) (existing algorithms satisfy neither). At a high level, our algorithm generalizes (to MLMDPs) and refines the Sup-Lin-UCB algorithm, which Takemura et al. [2021] recently showed satisfies (P3) in the contextual bandit setting. We also provide an intuitive interpretation of their result, which informs the design of our algorithm. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/vial22a.html
  PDF: https://proceedings.mlr.press/v151/vial22a/vial22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-vial22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Daniel
    family: Vial
  - given: Advait
    family: Parulekar
  - given: Sanjay
    family: Shakkottai
  - given: R
    family: Srikant
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4723-4746
  id: vial22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4723
  lastpage: 4746
  published: 2022-05-03 00:00:00 +0000
- title: ' Synthsonic: Fast, Probabilistic modeling and Synthesis of Tabular Data '
  abstract: ' The creation of realistic, synthetic datasets has several purposes with growing demand in recent times, e.g. privacy protection and other cases where real data cannot be easily shared. A multitude of primarily neural networks (NNs), e.g. Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), or Bayesian Network (BN) approaches have been created to tackle this problem, however these require extensive compute resources, lack interpretability, and in some instances lack replication fidelity as well. We propose a hybrid, probabilistic approach for synthesizing pairwise independent tabular data, called Synthsonic. A sequence of well-understood, invertible statistical transformations removes first-order correlations, then a Bayesian Network jointly models continuous and categorical variables, and a calibrated discriminative learner captures the remaining dependencies. Replication studies on MIT’s SDGym benchmark show marginally or significantly better performance than all prior BN-based approaches, while being competitive with NN-based approaches (first place in 10 out of 13 benchmark datasets). The computational time required to learn the data distribution is at least one order of magnitude lower than the NN methods. Furthermore, inspecting intermediate results during the synthetic data generation allows easy diagnostics and tailored corrections. We believe the combination of out-of-the-box performance, speed and interpretability make this method a significant addition to the synthetic data generation '
  volume: 151
  URL: https://proceedings.mlr.press/v151/baak22a.html
  PDF: https://proceedings.mlr.press/v151/baak22a/baak22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-baak22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Max
    family: Baak
  - given: Simon
    family: Brugman
  - given: Ilan
    family: Fridman Rojas
  - given: Lorraine
    family: Dalmeida
  - given: Ralph
    family: E.Q. Urlus
  - given: Jean-Baptiste
    family: Oger
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4747-4763
  id: baak22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4747
  lastpage: 4763
  published: 2022-05-03 00:00:00 +0000
- title: ' Lagrangian manifold Monte Carlo on Monge patches '
  abstract: ' The efficiency of Markov Chain Monte Carlo (MCMC) depends on how the underlying geometry of the problem is taken into account. For distributions with strongly varying curvature, Riemannian metrics help in efficient exploration of the target distribution. Unfortunately, they have significant computational overhead due to e.g. repeated inversion of the metric tensor, and current geometric MCMC methods using the Fisher information matrix to induce the manifold are in practice slow. We propose a new alternative Riemannian metric for MCMC, by embedding the target distribution into a higher-dimensional Euclidean space as a Monge patch, thus using the induced metric determined by direct geometric reasoning. Our metric only requires first-order gradient information and has fast inverse and determinants, and allows reducing the computational complexity of individual iterations from cubic to quadratic in the problem dimensionality. We demonstrate how Lagrangian Monte Carlo in this metric efficiently explores the target distributions. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/hartmann22a.html
  PDF: https://proceedings.mlr.press/v151/hartmann22a/hartmann22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-hartmann22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Marcelo
    family: Hartmann
  - given: Mark
    family: Girolami
  - given: Arto
    family: Klami
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4764-4781
  id: hartmann22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4764
  lastpage: 4781
  published: 2022-05-03 00:00:00 +0000
- title: ' Optimal Accounting of Differential Privacy via Characteristic Function '
  abstract: ' Characterizing the privacy degradation over compositions, i.e., privacy accounting, is a fundamental topic in differential privacy (DP) with many applications to differentially private machine learning and federated learning. We propose a unification of recent advances (Renyi DP, privacy profiles, $f$-DP and the PLD formalism) via the characteristic function ($\phi$-function) of a certain dominating privacy loss random variable. We show that our approach allows natural adaptive composition like Renyi DP, provides exactly tight privacy accounting like PLD, and can be (often losslessly) converted to privacy profile and $f$-DP, thus providing $(\epsilon,\delta)$-DP guarantees and interpretable tradeoff functions. Algorithmically, we propose an analytical Fourier accountant that represents the complex logarithm of $\phi$-functions symbolically and uses Gaussian quadrature for numerical computation. On several popular DP mechanisms and their subsampled counterparts, we demonstrate the flexibility and tightness of our approach in theory and experiments. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/zhu22c.html
  PDF: https://proceedings.mlr.press/v151/zhu22c/zhu22c.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-zhu22c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yuqing
    family: Zhu
  - given: Jinshuo
    family: Dong
  - given: Yu-Xiang
    family: Wang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4782-4817
  id: zhu22c
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4782
  lastpage: 4817
  published: 2022-05-03 00:00:00 +0000
- title: ' Adaptive Gaussian Processes on Graphs via Spectral Graph Wavelets '
  abstract: ' Graph-based models require aggregating information in the graph from neighbourhoods of different sizes. In particular, when the data exhibit varying levels of smoothness on the graph, a multi-scale approach is required to capture the relevant information. In this work, we propose a Gaussian process model using spectral graph wavelets, which can naturally aggregate neighbourhood information at different scales. Through maximum likelihood optimisation of the model hyperparameters, the wavelets automatically adapt to the different frequencies in the data, and as a result our model goes beyond capturing low frequency information. We achieve scalability to larger graphs by using a spectrum-adaptive polynomial approximation of the filter function, which is designed to yield a low approximation error in dense areas of the graph spectrum. Synthetic and real-world experiments demonstrate the ability of our model to infer scales accurately and produce competitive performances against state-of-the-art models in graph-based learning tasks. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/opolka22a.html
  PDF: https://proceedings.mlr.press/v151/opolka22a/opolka22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-opolka22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Felix
    family: Opolka
  - given: Yin-Cong
    family: Zhi
  - given: Pietro
    family: Lió
  - given: Xiaowen
    family: Dong
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4818-4834
  id: opolka22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4818
  lastpage: 4834
  published: 2022-05-03 00:00:00 +0000
- title: ' Bayesian Link Prediction with Deep Graph Convolutional Gaussian Processes '
  abstract: ' Link prediction aims to reveal missing edges in a graph. We introduce a deep graph convolutional Gaussian process model for this task, which addresses recent challenges in graph machine learning with oversmoothing and overfitting. Using simplified graph convolutions, we transform a Gaussian process to leverage the topological information of the graph domain. To scale the Gaussian process model to larger graphs, we introduce a variational inducing point method that places pseudo-inputs on a graph-structured domain. Multiple Gaussian processes are assembled into a hierarchy whose structure allows skipping convolutions and thus counteracting oversmoothing. The proposed model represents the first Gaussian process for link prediction that makes use of both node features and topological information. We evaluate our model on multiple graph data sets with up to thousands of nodes and report consistent improvements over competitive link prediction approaches. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/opolka22b.html
  PDF: https://proceedings.mlr.press/v151/opolka22b/opolka22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-opolka22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Felix
    family: Opolka
  - given: Pietro
    family: Lió
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4835-4852
  id: opolka22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4835
  lastpage: 4852
  published: 2022-05-03 00:00:00 +0000
- title: ' On the Interplay between Information Loss and Operation Loss in Representations for Classification '
  abstract: ' Information-theoretic measures have been widely adopted in the design of features for learning and decision problems. Inspired by this, we look at the relationship between i) a weak form of information loss in the Shannon sense and ii) operational loss in the minimum probability of error (MPE) sense when considering a family of lossy continuous representations of an observation. Our first result offers a lower bound on a weak form of information loss as a function of its respective operation loss when adopting a discrete lossy representation (quantization) instead of the original raw observation. From this, our main result shows that a specific form of vanishing information loss (a weak notion of asymptotic informational sufficiency) implies a vanishing MPE loss (or asymptotic operational sufficiency) when considering a family of lossy continuous representations. Our theoretical findings support the observation that the selection of feature representations that attempt to capture informational sufficiency is appropriate for learning, but this design principle is a rather conservative if the intended goal is achieving MPE in classification. On this last point, we discuss about studying weak forms of informational sufficiencies to achieve operational sufficiency in learning settings. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/silva22b.html
  PDF: https://proceedings.mlr.press/v151/silva22b/silva22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-silva22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jorge
    family: Silva
  - given: Felipe
    family: Tobar
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4853-4871
  id: silva22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4853
  lastpage: 4871
  published: 2022-05-03 00:00:00 +0000
- title: ' Pulling back information geometry '
  abstract: ' Latent space geometry has shown itself to provide a rich and rigorous framework for interacting with the latent variables of deep generative models. The existing theory, however, relies on the decoder being a Gaussian distribution as its simple reparametrization allows us to interpret the generating process as a random projection of a deterministic manifold. Consequently, this approach breaks down when applied to decoders that are not as easily reparametrized. We here propose to use the Fisher-Rao metric associated with the space of decoder distributions as a reference metric, which we pull back to the latent space. We show that we can achieve meaningful latent geometries for a wide range of decoder distributions for which the previous theory was not applicable, opening the door to ’black box’ latent geometries. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/arvanitidis22b.html
  PDF: https://proceedings.mlr.press/v151/arvanitidis22b/arvanitidis22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-arvanitidis22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Georgios
    family: Arvanitidis
  - given: Miguel
    family: González-Duque
  - given: Alison
    family: Pouplin
  - given: Dimitrios
    family: Kalatzis
  - given: Soren
    family: Hauberg
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4872-4894
  id: arvanitidis22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4872
  lastpage: 4894
  published: 2022-05-03 00:00:00 +0000
- title: ' Optimizing Early Warning Classifiers to Control False Alarms via a Minimum Precision Constraint '
  abstract: ' Early warning prediction systems can suffer from high false alarm rates that limit utility, especially in settings with high class imbalance such as healthcare. Despite the widespread need to control false alarms, the dominant classifier training paradigm remains minimizing cross entropy, a loss function which does not treat false alarms differently than other types of mistakes. While existing efforts often try to reduce false alarms by post-hoc threshold selection after training, we suggest a comprehensive solution by changing the loss function used to train the classifier. Our proposed objective maximizes recall while enforcing a constraint requiring precision to exceed a specified value. We make our objective tractable for gradient-based optimization by developing tight sigmoidal bounds on the counts needed to compute precision and recall. Our objective is applicable to any classifier trainable via gradient descent, including linear models and neural networks. When predicting mortality risk across two large hospital datasets, we show how our method satisfies a desired constraint on false alarms while achieving better recall than alternatives. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/rath22a.html
  PDF: https://proceedings.mlr.press/v151/rath22a/rath22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-rath22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Preetish
    family: Rath
  - given: Michael
    family: Hughes
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4895-4914
  id: rath22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4895
  lastpage: 4914
  published: 2022-05-03 00:00:00 +0000
- title: ' Resampling Base Distributions of Normalizing Flows '
  abstract: ' Normalizing flows are a popular class of models for approximating probability distributions. However, their invertible nature limits their ability to model target distributions whose support have a complex topological structure, such as Boltzmann distributions. Several procedures have been proposed to solve this problem but many of them sacrifice invertibility and, thereby, tractability of the log-likelihood as well as other desirable properties. To address these limitations, we introduce a base distribution for normalizing flows based on learned rejection sampling, allowing the resulting normalizing flow to model complicated distributions without giving up bijectivity. Furthermore, we develop suitable learning algorithms using both maximizing the log-likelihood and the optimization of the Kullback-Leibler divergence, and apply them to various sample problems, i.e. approximating 2D densities, density estimation of tabular data, image generation, and modeling Boltzmann distributions. In these experiments our method is competitive with or outperforms the baselines. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/stimper22a.html
  PDF: https://proceedings.mlr.press/v151/stimper22a/stimper22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-stimper22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Vincent
    family: Stimper
  - given: Bernhard
    family: Schölkopf
  - given: Jose
    family: Miguel Hernandez-Lobato
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4915-4936
  id: stimper22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4915
  lastpage: 4936
  published: 2022-05-03 00:00:00 +0000
- title: ' Federated Myopic Community Detection with One-shot Communication '
  abstract: ' In this paper, we study the problem of recovering the community structure of a network under federated myopic learning. Under this paradigm, we have several clients, each of them having a myopic view, i.e., observing a small subgraph of the network. Each client sends a censored evidence graph to a central server. We provide an efficient algorithm, which computes a consensus signed weighted graph from clients evidence, and recovers the underlying network structure in the central server. We analyze the topological structure conditions of the network, as well as the signal and noise levels of the clients that allow for recovery of the network structure. Our analysis shows that exact recovery is possible and can be achieved in polynomial time. In addition, our experiments show that in an extremely sparse network with 10000 nodes, our method can achieve exact recovery of the community structure even if every client has access to only 20 nodes. We also provide information-theoretic limits for the central server to recover the network structure from any single client evidence. Finally, as a byproduct of our analysis, we provide a novel Cheeger-type inequality for general signed weighted graphs. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ke22a.html
  PDF: https://proceedings.mlr.press/v151/ke22a/ke22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ke22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Chuyang
    family: Ke
  - given: Jean
    family: Honorio
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4937-4954
  id: ke22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4937
  lastpage: 4954
  published: 2022-05-03 00:00:00 +0000
- title: ' Variational Gaussian Processes: A Functional Analysis View '
  abstract: ' Variational Gaussian process (GP) approximations have become a standard tool in fast GP inference. This technique requires a user to select variational features to increase efficiency. So far the common choices in the literature are disparate and lacking generality. We propose to view the GP as lying in a Banach space which then facilitates a unified perspective. This is used to understand the relationship between existing features and to draw a connection between kernel ridge regression and variational GP approximations. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/wynne22a.html
  PDF: https://proceedings.mlr.press/v151/wynne22a/wynne22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-wynne22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: George
    family: Wynne
  - given: Veit
    family: Wild
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4955-4971
  id: wynne22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4955
  lastpage: 4971
  published: 2022-05-03 00:00:00 +0000
- title: ' Adversarially Robust Kernel Smoothing '
  abstract: ' We propose a scalable robust learning algorithm combining kernel smoothing and robust optimization. Our method is motivated by the convex analysis perspective of distributionally robust optimization based on probability metrics, such as the Wasserstein distance and the maximum mean discrepancy. We adapt the integral operator using supremal convolution in convex analysis to form a novel function majorant used for enforcing robustness. Our method is simple in form and applies to general loss functions and machine learning models. Exploiting a connection with optimal transport, we prove theoretical guarantees for certified robustness under distribution shift. Furthermore, we report experiments with general machine learning models, such as deep neural networks, to demonstrate competitive performance with the state-of-the-art certifiable robust learning algorithms based on the Wasserstein distance. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/zhu22d.html
  PDF: https://proceedings.mlr.press/v151/zhu22d/zhu22d.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-zhu22d.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jia-Jie
    family: Zhu
  - given: Christina
    family: Kouridi
  - given: Yassine
    family: Nemmour
  - given: Bernhard
    family: Schölkopf
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4972-4994
  id: zhu22d
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4972
  lastpage: 4994
  published: 2022-05-03 00:00:00 +0000
- title: ' Faster Unbalanced Optimal Transport: Translation invariant Sinkhorn and 1-D Frank-Wolfe '
  abstract: ' Unbalanced optimal transport (UOT) extends optimal transport (OT) to take into account mass variations when comparing distributions. This is crucial for successful ML applications of OT, as it makes it robust to data normalization and outliers. The baseline algorithm is Sinkhorn, but its convergence speed might be significantly slower for UOT than for OT. In this work, we identify the cause for this deficiency, namely the lack of a global normalization of the iterates, which equivalently corresponds to a translation of the dual OT potentials. Our first contribution leverages this idea to develop an accelerated Sinkhorn algorithm (coined "translation invariant Sinkhorn") for UOT, bridging the computational gap with OT. Our second contribution focuses on 1-D UOT and proposes a Frank-Wolfe solver applied to this translation invariant formulation. The linear oracle of each step amounts to solving a 1-D OT problem, resulting in a linear time complexity per iteration. Our last contribution extends this method to the computation of UOT barycenter of 1-D measures. Numerical simulations showcase the convergence speed improvement brought by these three approaches. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/sejourne22a.html
  PDF: https://proceedings.mlr.press/v151/sejourne22a/sejourne22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-sejourne22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Thibault
    family: Sejourne
  - given: Francois-Xavier
    family: Vialard
  - given: Gabriel
    family: Peyré
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 4995-5021
  id: sejourne22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 4995
  lastpage: 5021
  published: 2022-05-03 00:00:00 +0000
- title: ' Off-Policy Risk Assessment for Markov Decision Processes '
  abstract: ' Addressing such diverse ends as mitigating safety risks, aligning agent behavior with human preferences, and improving the efficiency of learning, an emerging line of reinforcement learning research addresses the entire distribution of returns and various risk functionals that depend upon it. In the contextual bandit setting, recently work on off-policy risk assessment estimates the target policy’s CDF of returns, providing finite sample guarantees that extend to (and hold simultaneously over) plugin estimates of an arbitrarily large set of risk functionals. In this paper, we lift OPRA to Markov decision processes (MDPs), where importance sampling (IS) CDF estimators suffer high variance on longer trajectories due to vanishing (and exploding) importance weights. To mitigate these problems, we incorporate model-based estimation to develop the first doubly robust (DR) estimator for the CDF of returns in MDPs. The DR estimator enjoys significantly less variance and, when the model is well specified, achieves the Cramer-Rao variance lower bound. Moreover, for many risk functionals, the downstream estimates enjoy both lower bias and lower variance. Additionally, we derive the first minimax lower bounds for off-policy CDF and risk estimation, which match our error bounds up to a constant. Finally, we demonstrate the efficacy of our DR CDF estimates experimentally on several different environments. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/huang22b.html
  PDF: https://proceedings.mlr.press/v151/huang22b/huang22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-huang22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Audrey
    family: Huang
  - given: Liu
    family: Leqi
  - given: Zachary
    family: Lipton
  - given: Kamyar
    family: Azizzadenesheli
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5022-5050
  id: huang22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5022
  lastpage: 5050
  published: 2022-05-03 00:00:00 +0000
- title: ' Predictive variational Bayesian inference as risk-seeking optimization '
  abstract: ' Since the Bayesian inference works poorly under model misspecification, various solutions have been explored to counteract the shortcomings. Recently proposed predictive Bayes (PB) that directly optimizes the Kullback Leibler divergence between the empirical distribution and the approximate predictive distribution shows excellent performances not only under model misspecification but also for over-parametrized models. However, its behavior and superiority are still unclear, which limits the applications of PB. Specifically, the superiority of PB has been shown only in terms of the predictive test log-likelihood and the performance in the sense of parameter estimation has not been investigated yet. Also, it is not clear why PB is superior with misspecified and over-parameterized models. In this paper, we clarify these ambiguities by studying PB in the framework of risk-seeking optimization. To achieve this, first, we provide a consistency theory for PB and then present intuition of robustness of PB to model misspecification using a response function theory. Thereafter, we theoretically and numerically show that PB has an implicit regularization effect that leads to flat local minima in over-parametrized models. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/futami22a.html
  PDF: https://proceedings.mlr.press/v151/futami22a/futami22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-futami22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Futoshi
    family: Futami
  - given: Tomoharu
    family: Iwata
  - given: Naonori
    family: Ueda
  - given: Issei
    family: Sato
  - given: Masashi
    family: Sugiyama
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5051-5083
  id: futami22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5051
  lastpage: 5083
  published: 2022-05-03 00:00:00 +0000
- title: ' Kantorovich Mechanism for Pufferfish Privacy '
  abstract: ' Pufferfish privacy achieves $\epsilon$-indistinguishability over a set of secret pairs in the disclosed data. This paper studies how to attain $\epsilon$-pufferfish privacy by exponential mechanism, an additive noise scheme that generalizes the Laplace noise. It is shown that the disclosed data is $\epsilon$-pufferfish private if the noise is calibrated to the sensitivity of the Kantorovich optimal transport plan. Such a plan can be obtained directly from the data statistics conditioned on the secret, the prior knowledge of the system. The sufficient condition is further relaxed to reduce the noise power. It is also proved that the Gaussian mechanism based on the Kantorovich approach attains the $\delta$-approximation of $\epsilon$-pufferfish privacy. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ding22b.html
  PDF: https://proceedings.mlr.press/v151/ding22b/ding22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ding22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ni
    family: Ding
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5084-5103
  id: ding22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5084
  lastpage: 5103
  published: 2022-05-03 00:00:00 +0000
- title: ' Margin-distancing for safe model explanation '
  abstract: ' The growing use of machine learning models in consequential settings has highlighted an important and seemingly irreconcilable tension between transparency and vulnerability to gaming. While this has sparked sizable debate in legal literature, there has been comparatively less technical study of this contention. In this work, we propose a clean-cut formulation of this tension and a way to make the tradeoff between transparency and gaming. We identify the source of gaming as being points close to the decision boundary of the model. And we initiate an investigation on how to provide example-based explanations that are expansive and yet consistent with a version space that is sufficiently uncertain with respect to the boundary points’ labels. Finally, we furnish our theoretical results with empirical investigations of this tradeoff on real-world datasets. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/yan22a.html
  PDF: https://proceedings.mlr.press/v151/yan22a/yan22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-yan22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Tom
    family: Yan
  - given: Chicheng
    family: Zhang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5104-5134
  id: yan22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5104
  lastpage: 5134
  published: 2022-05-03 00:00:00 +0000
- title: ' Optimal transport with $f$-divergence regularization and generalized Sinkhorn algorithm '
  abstract: ' Entropic regularization provides a generalization of the original optimal transport problem. It introduces a penalty term defined by the Kullback-Leibler divergence, making the problem more tractable via the celebrated Sinkhorn algorithm. Replacing the Kullback-Leibler divergence with a general $f$-divergence leads to a natural generalization. The case of divergences defined by superlinear functions was recently studied by Di Marino and Gerolin. Using convex analysis, we extend the theory developed so far to include all $f$-divergences defined by functions of Legendre type, and prove that under some mild conditions, strong duality holds, optimums in both the primal and dual problems are attained, the generalization of the $c$-transform is well-defined, and we give sufficient conditions for the generalized Sinkhorn algorithm to converge to an optimal solution. We propose a practical algorithm for computing an approximate solution of the optimal transport problem with $f$-divergence regularization via the generalized Sinkhorn algorithm. Finally, we present experimental results on synthetic 2-dimensional data, demonstrating the effects of using different $f$-divergences for regularization, which influences convergence speed, numerical stability and sparsity of the optimal coupling. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/terjek22a.html
  PDF: https://proceedings.mlr.press/v151/terjek22a/terjek22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-terjek22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Dávid
    family: Terjék
  - given: Diego
    family: González-Sánchez
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5135-5165
  id: terjek22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5135
  lastpage: 5165
  published: 2022-05-03 00:00:00 +0000
- title: ' Modelling Non-Smooth Signals with Complex Spectral Structure '
  abstract: ' The Gaussian Process Convolution Model (GPCM; Tobar et al., 2015a) is a model for signals with complex spectral structure. A significant limitation of the GPCM is that it assumes a rapidly decaying spectrum: it can only model smooth signals. Moreover, inference in the GPCM currently requires (1) a mean-field assumption, resulting in poorly calibrated uncertainties, and (2) a tedious variational optimisation of large covariance matrices. We redesign the GPCM model to induce a richer distribution over the spectrum with relaxed assumptions about smoothness: the Causal Gaussian Process Convolution Model (CGPCM) introduces a causality assumption into the GPCM, and the Rough Gaussian Process Convolution Model (RGPCM) can be interpreted as a Bayesian nonparametric generalisation of the fractional Ornstein-Uhlenbeck process. We also propose a more effective variational inference scheme, going beyond the mean-field assumption: we design a Gibbs sampler which directly samples from the optimal variational solution, circumventing any variational optimisation entirely. The proposed variations of the GPCM are validated in experiments on synthetic and real-world data, showing promising results. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/bruinsma22a.html
  PDF: https://proceedings.mlr.press/v151/bruinsma22a/bruinsma22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-bruinsma22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Wessel P.
    family: Bruinsma
  - given: Martin
    family: Tegnér
  - given: Richard E.
    family: Turner
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5166-5195
  id: bruinsma22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5166
  lastpage: 5195
  published: 2022-05-03 00:00:00 +0000
- title: ' Convergent Working Set Algorithm for Lasso with Non-Convex Sparse Regularizers '
  abstract: ' Non-convex sparse regularizers are common tools for learning with high-dimensional data. For accelerating convergence for Lasso problem involving those regularizers, a working set strategy addresses the optimization problem through an iterative algorithm by gradually incrementing the number of variables to optimize until the identification of the solution support. We propose in this paper the first Lasso working set algorithm for non-convex sparse regularizers with convergence guarantees. The algorithm, named FireWorks, is based on a non-convex reformulation of a recent duality-based approach and leverages on the geometry of the residuals. We provide theoretical guarantees showing that convergence is preserved even when the inner solver is inexact, under sufficient decay of the error across iterations. Experimental results demonstrate strong computational gain when using our working set strategy compared to full problem solvers for both block-coordinate descent or a proximal gradient solver. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/rakotomamonjy22a.html
  PDF: https://proceedings.mlr.press/v151/rakotomamonjy22a/rakotomamonjy22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-rakotomamonjy22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Alain
    family: Rakotomamonjy
  - given: Rémi
    family: Flamary
  - given: Joseph
    family: Salmon
  - given: Gilles
    family: Gasso
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5196-5211
  id: rakotomamonjy22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5196
  lastpage: 5211
  published: 2022-05-03 00:00:00 +0000
- title: ' Particle-based Adversarial Local Distribution Regularization '
  abstract: ' Adversarial training defense (ATD) and virtual adversarial training (VAT) are the two most effective methods to improve model robustness against attacks and model generalization. While ATD is usually applied in robust machine learning, VAT is used in semi-supervised learning and domain adaption. In this paper, we introduce a novel adversarial local distribution regularization. The adversarial local distribution is defined by a set of all adversarial examples within a ball constraint given a natural input. We illustrate this regularization is a general form of previous methods (e.g., PGD, TRADES, VAT and VADA). We conduct comprehensive experiments on MNIST, SVHN and CIFAR10 to illustrate that our method outperforms well-known methods such as PGD, TRADES and ADT in robust machine learning, VAT in semi-supervised learning and VADA in domain adaption. Our implementation is on Github: https://github.com/PotatoThanh/ALD-Regularization. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/nguyen-duc22a.html
  PDF: https://proceedings.mlr.press/v151/nguyen-duc22a/nguyen-duc22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-nguyen-duc22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Thanh
    family: Nguyen-Duc
  - given: Trung
    family: Le
  - given: He
    family: Zhao
  - given: Jianfei
    family: Cai
  - given: Dinh
    family: Phung
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5212-5224
  id: nguyen-duc22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5212
  lastpage: 5224
  published: 2022-05-03 00:00:00 +0000
- title: ' Crowdsourcing Regression: A Spectral Approach '
  abstract: ' Merging the predictions of multiple experts is a frequent task. When ground-truth response values are available, this merging is often based on the estimated accuracies of the experts. In various applications, however, the only available information are the experts’ predictions on unlabeled test data, which do not allow to directly estimate their accuracies. Moreover, simple merging schemes such as majority voting in classification or the ensemble mean or median in regression, are clearly sub-optimal when some experts are more accurate than others. Focusing on regression tasks, in this work we propose U-PCR, a framework for unsupervised ensemble regression. Specifically, we develop spectral-based methods that under mild assumptions and in the absence of ground truth data, are able to estimate the mean squared error of the different experts and combine their predictions to a more accurate meta-learner. We provide theoretical support for U-PCR as well as empirical evidence for the validity of its underlying assumptions. On a variety of regression problems, we illustrate the improved accuracy of U-PCR over various unsupervised merging strategies. Finally, we also illustrate its applicability to unsupervised multi-class ensemble learning. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/tenzer22a.html
  PDF: https://proceedings.mlr.press/v151/tenzer22a/tenzer22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-tenzer22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yaniv
    family: Tenzer
  - given: Omer
    family: Dror
  - given: Boaz
    family: Nadler
  - given: Erhan
    family: Bilal
  - given: Yuval
    family: Kluger
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5225-5242
  id: tenzer22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5225
  lastpage: 5242
  published: 2022-05-03 00:00:00 +0000
- title: ' How to scale hyperparameters for quickshift image segmentation '
  abstract: ' Quickshift is a popular algorithm for image segmentation, used as a preprocessing step in many applications. Unfortunately, it is quite challenging to understand the hyperparameters’ influence on the number and shape of superpixels produced by the method. In this paper, we study theoretically a slightly modified version of the quickshift algorithm, with a particular emphasis on homogeneous image patches with i.i.d. pixel noise and sharp boundaries between such patches. Leveraging this analysis, we derive a simple heuristic to scale quickshift hyperparameters with respect to the image size, which we check empirically. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/garreau22a.html
  PDF: https://proceedings.mlr.press/v151/garreau22a/garreau22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-garreau22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Damien
    family: Garreau
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5243-5275
  id: garreau22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5243
  lastpage: 5275
  published: 2022-05-03 00:00:00 +0000
- title: ' Wide Mean-Field Bayesian Neural Networks Ignore the Data '
  abstract: ' Bayesian neural networks (BNNs) combine the expressive power of deep learning with the advantages of Bayesian formalism. In recent years, the analysis of wide, deep BNNs has provided theoretical insight into their priors and posteriors. However, we have no analogous insight into their posteriors under approximate inference. In this work, we show that mean-field variational inference <em>entirely fails to model the data</em> when the network width is large and the activation function is odd. Specifically, for fully-connected BNNs with odd activation functions and a homoscedastic Gaussian likelihood, we show that the <em>optimal</em> mean-field variational posterior predictive (i.e., function space) distribution converges to the prior predictive distribution as the width tends to infinity. We generalize aspects of this result to other likelihoods. Our theoretical results are suggestive of underfitting behavior previously observered in BNNs. While our convergence bounds are non-asymptotic and constants in our analysis can be computed, they are currently too loose to be applicable in standard training regimes. Finally, we show that the optimal approximate posterior need not tend to the prior if the activation function is not odd, showing that our statements cannot be generalized arbitrarily. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/coker22a.html
  PDF: https://proceedings.mlr.press/v151/coker22a/coker22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-coker22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Beau
    family: Coker
  - given: Wessel P.
    family: Bruinsma
  - given: David R.
    family: Burt
  - given: Weiwei
    family: Pan
  - given: Finale
    family: Doshi-Velez
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5276-5333
  id: coker22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5276
  lastpage: 5333
  published: 2022-05-03 00:00:00 +0000
- title: ' Privacy Amplification by Decentralization '
  abstract: ' Analyzing data owned by several parties while achieving a good trade-off between utility and privacy is a key challenge in federated learning and analytics. In this work, we introduce a novel relaxation of local differential privacy (LDP) that naturally arises in fully decentralized algorithms, i.e., when participants exchange information by communicating along the edges of a network graph without central coordinator. This relaxation, that we call network DP, captures the fact that users have only a local view of the system. To show the relevance of network DP, we study a decentralized model of computation where a token performs a walk on the network graph and is updated sequentially by the party who receives it. For tasks such as real summation, histogram computation and optimization with gradient descent, we propose simple algorithms on ring and complete topologies. We prove that the privacy-utility trade-offs of our algorithms under network DP significantly improve upon what is achievable under LDP, and often match the utility of the trusted curator model. Our results show for the first time that formal privacy gains can be obtained from full decentralization. We also provide experiments to illustrate the improved utility of our approach for decentralized training with stochastic gradient descent. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/cyffers22a.html
  PDF: https://proceedings.mlr.press/v151/cyffers22a/cyffers22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-cyffers22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Edwige
    family: Cyffers
  - given: Aurélien
    family: Bellet
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5334-5353
  id: cyffers22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5334
  lastpage: 5353
  published: 2022-05-03 00:00:00 +0000
- title: ' Reinforcement Learning with Fast Stabilization in Linear Dynamical Systems '
  abstract: ' In this work, we study model-based reinforcement learning (RL) in unknown stabilizable linear dynamical systems. When learning a dynamical system, one needs to stabilize the unknown dynamics in order to avoid system blow-ups. We propose an algorithm that certifies fast stabilization of the underlying system by effectively exploring the environment with an improved exploration strategy. We show that the proposed algorithm attains $\Tilde{\mathcal{O}}(\sqrt{T})$ regret after $T$ time steps of agent-environment interaction. We also show that the regret of the proposed algorithm has only a polynomial dependence in the problem dimensions, which gives an exponential improvement over the prior methods. Our improved exploration method is simple, yet efficient, and it combines a sophisticated exploration policy in RL with an isotropic exploration strategy to achieve fast stabilization and improved regret. We empirically demonstrate that the proposed algorithm outperforms other popular methods in several adaptive control tasks. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/lale22a.html
  PDF: https://proceedings.mlr.press/v151/lale22a/lale22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-lale22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Sahin
    family: Lale
  - given: Kamyar
    family: Azizzadenesheli
  - given: Babak
    family: Hassibi
  - given: Animashree
    family: Anandkumar
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5354-5390
  id: lale22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5354
  lastpage: 5390
  published: 2022-05-03 00:00:00 +0000
- title: ' New Coresets for Projective Clustering and Applications '
  abstract: ' $(j,k)$-projective clustering is the natural generalization of the family of $k$-clustering and $j$-subspace clustering problems. Given a set of points $P$ in $\mathbb{R}^d$, the goal is to find $k$ flats of dimension $j$, i.e., affine subspaces, that best fit $P$ under a given distance measure. In this paper, we propose the first algorithm that returns an $L_\infty$ coreset of size polynomial in $d$. Moreover, we give the first strong coreset construction for general $M$-estimator regression. Specifically, we show that our construction provides efficient coreset constructions for Cauchy, Welsch, Huber, Geman-McClure, Tukey, $L_1-L_2$, and Fair regression, as well as general concave and power-bounded loss functions. Finally, we provide experimental results based on real-world datasets, showing the efficacy of our approach. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/tukan22a.html
  PDF: https://proceedings.mlr.press/v151/tukan22a/tukan22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-tukan22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Murad
    family: Tukan
  - given: Xuan
    family: Wu
  - given: Samson
    family: Zhou
  - given: Vladimir
    family: Braverman
  - given: Dan
    family: Feldman
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5391-5415
  id: tukan22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5391
  lastpage: 5415
  published: 2022-05-03 00:00:00 +0000
- title: ' Weak Separation in Mixture Models and Implications for Principal Stratification '
  abstract: ' Principal stratification is a popular framework for addressing post-randomization complications, often in conjunction with finite mixture models for estimating the causal effects of interest. Unfortunately, standard estimators of mixture parameters, like the MLE, are known to exhibit pathological behavior. We study this behavior in a simple but fundamental example, a two-component Gaussian mixture model in which only the component means and variances are unknown, and focus on the setting in which the components are weakly separated. In this case, we show that the asymptotic convergence rate of the MLE is quite poor, such as $O(n^{-1/6})$ or even $O(n^{-1/8})$. We then demonstrate via both theoretical arguments and extensive simulations that the MLE behaves like a threshold estimator in finite samples, in the sense that the MLE can give strong evidence that the means are equal when the truth is otherwise. We also explore the behavior of the MLE when the MLE is non-zero, showing that it is difficult to estimate both the sign and magnitude of the means in this case. We provide diagnostics for all of these pathologies and apply these ideas to re-analyzing two randomized evaluations of job training programs, JOBS II and Job Corps. Our results suggest that the corresponding maximum likelihood estimates should be interpreted with caution in these cases. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ho22b.html
  PDF: https://proceedings.mlr.press/v151/ho22b/ho22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ho22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Nhat
    family: Ho
  - given: Avi
    family: Feller
  - given: Evan
    family: Greif
  - given: Luke
    family: Miratrix
  - given: Natesh
    family: Pillai
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5416-5458
  id: ho22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5416
  lastpage: 5458
  published: 2022-05-03 00:00:00 +0000
- title: ' Using time-series privileged information for provably efficient learning of prediction models '
  abstract: ' We study prediction of future outcomes with supervised models that use privileged information during learning. The privileged information comprises samples of time series observed between the baseline time of prediction and the future outcome; this information is only available at training time which differs from the traditional supervised learning. Our question is when using this privileged data leads to more sample-efficient learning of models that use only baseline data for predictions at test time. We give an algorithm for this setting and prove that when the time series are drawn from a non-stationary Gaussian-linear dynamical system of fixed horizon, learning with privileged information is more efficient than learning without it. On synthetic data, we test the limits of our algorithm and theory, both when our assumptions hold and when they are violated. On three diverse real-world datasets, we show that our approach is generally preferable to classical learning, particularly when data is scarce. Finally, we relate our estimator to a distillation approach both theoretically and empirically. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/k-a-karlsson22a.html
  PDF: https://proceedings.mlr.press/v151/k-a-karlsson22a/k-a-karlsson22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-k-a-karlsson22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Rickard
    family: K.A. Karlsson
  - given: Martin
    family: Willbo
  - given: Zeshan M.
    family: Hussain
  - given: Rahul G.
    family: Krishnan
  - given: David
    family: Sontag
  - given: Fredrik
    family: Johansson
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5459-5484
  id: k-a-karlsson22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5459
  lastpage: 5484
  published: 2022-05-03 00:00:00 +0000
- title: ' Faster Single-loop Algorithms for Minimax Optimization without Strong Concavity '
  abstract: ' Gradient descent ascent (GDA), the simplest single-loop algorithm for nonconvex minimax optimization, is widely used in practical applications such as generative adversarial networks (GANs) and adversarial training. Albeit its desirable simplicity, recent work shows inferior convergence rates of GDA in theory, even when assuming strong concavity of the objective in terms of one variable. This paper establishes new convergence results for two alternative single-loop algorithms – alternating GDA and smoothed GDA – under the mild assumption that the objective satisfies the Polyak-Lojasiewicz (PL) condition about one variable. We prove that, to find an $\epsilon$-stationary point, (i) alternating GDA and its stochastic variant (without mini batch) respectively require $O(\kappa^{2} \epsilon^{-2})$ and $O(\kappa^{4} \epsilon^{-4})$ iterations, while (ii) smoothed GDA and its stochastic variant (without mini batch) respectively require $O(\kappa \epsilon^{-2})$ and $O(\kappa^{2} \epsilon^{-4})$ iterations. The latter greatly improves over the vanilla GDA and gives the hitherto best known complexity results among single-loop algorithms under similar settings. We further showcase the empirical efficiency of these algorithms in training GANs and robust nonlinear regression. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/yang22b.html
  PDF: https://proceedings.mlr.press/v151/yang22b/yang22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-yang22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Junchi
    family: Yang
  - given: Antonio
    family: Orvieto
  - given: Aurelien
    family: Lucchi
  - given: Niao
    family: He
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5485-5517
  id: yang22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5485
  lastpage: 5517
  published: 2022-05-03 00:00:00 +0000
- title: ' TD-GEN: Graph Generation Using Tree Decomposition '
  abstract: ' We propose TD-GEN, a graph generation framework based on tree decomposition, and introduce a reduced upper bound on the maximum number of decisions needed for graph generation. The framework includes a permutation invariant tree generation model which forms the backbone of graph generation. Tree nodes are supernodes, each representing a cluster of nodes in the graph. Graph nodes and edges are incrementally generated inside the clusters by traversing the tree supernodes, respecting the structure of the tree decomposition, and following node sharing decisions between the clusters. Further, we discuss the shortcomings of the standard evaluation criteria based on statistical properties of the generated graphs. We propose to compare the generalizability of models based on expected likelihood. Empirical results on a variety of standard graph generation datasets demonstrate the superior performance of our method. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/shirzad22a.html
  PDF: https://proceedings.mlr.press/v151/shirzad22a/shirzad22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-shirzad22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Hamed
    family: Shirzad
  - given: Hossein
    family: Hajimirsadeghi
  - given: Amir H.
    family: Abdi
  - given: Greg
    family: Mori
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5518-5537
  id: shirzad22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5518
  lastpage: 5537
  published: 2022-05-03 00:00:00 +0000
- title: ' Finding Valid Adjustments under Non-ignorability with Minimal DAG Knowledge '
  abstract: ' Treatment effect estimation from observational data is a fundamental problem in causal inference. There are two very different schools of thought that have tackled this problem. On the one hand, the Pearlian framework commonly assumes structural knowledge (provided by an expert) in the form of directed acyclic graphs and provides graphical criteria such as the back-door criterion to identify the valid adjustment sets. On the other hand, the potential outcomes (PO) framework commonly assumes that all the observed features satisfy ignorability (i.e., no hidden confounding), which in general is untestable. In prior works that attempted to bridge these frameworks, there is an observational criteria to identify an <em>anchor variable</em> and if a subset of covariates (not involving the anchor variable) passes a suitable conditional independence criteria, then that subset is a valid back-door. Our main result strengthens these prior results by showing that under a different expert-driven structural knowledge — that one variable is a direct causal parent of the treatment variable — remarkably, testing for subsets (not involving the known parent variable) that are valid back-doors is <em>equivalent</em> to an invariance test. Importantly, we also cover the non-trivial case where the entire set of observed features is not ignorable (generalizing the PO framework) without requiring the knowledge of all the parents of the treatment variable. Our key technical idea involves generation of a synthetic sub-sampling (or environment) variable that is a function of the known parent variable. In addition to designing an invariance test, this sub-sampling variable allows us to leverage Invariant Risk Minimization, and thus, connects finding valid adjustments (in non-ignorable observational settings) to representation learning. We demonstrate the effectiveness and tradeoffs of these approaches on a variety of synthetic datasets as well as real causal effect estimation benchmarks. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/shah22a.html
  PDF: https://proceedings.mlr.press/v151/shah22a/shah22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-shah22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Abhin
    family: Shah
  - given: Karthikeyan
    family: Shanmugam
  - given: Kartik
    family: Ahuja
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5538-5562
  id: shah22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5538
  lastpage: 5562
  published: 2022-05-03 00:00:00 +0000
- title: ' SHAFF: Fast and consistent SHApley eFfect estimates via random Forests '
  abstract: ' Interpretability of learning algorithms is crucial for applications involving critical decisions, and variable importance is one of the main interpretation tools. Shapley effects are now widely used to interpret both tree ensembles and neural networks, as they can efficiently handle dependence and interactions in the data, as opposed to most other variable importance measures. However, estimating Shapley effects is a challenging task, because of the computational complexity and the conditional expectation estimates. Accordingly, existing Shapley algorithms have flaws: a costly running time, or a bias when input variables are dependent. Therefore, we introduce SHAFF, SHApley eFfects via random Forests, a fast and accurate Shapley effect estimate, even when input variables are dependent. We show SHAFF efficiency through both a theoretical analysis of its consistency, and the practical performance improvements over competitors with extensive experiments. An implementation of SHAFF in C++ and R is available online. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/benard22a.html
  PDF: https://proceedings.mlr.press/v151/benard22a/benard22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-benard22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Clément
    family: Bénard
  - given: Gérard
    family: Biau
  - given: Sébastien
    family: Da Veiga
  - given: Erwan
    family: Scornet
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5563-5582
  id: benard22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5563
  lastpage: 5582
  published: 2022-05-03 00:00:00 +0000
- title: ' Almost Optimal Universal Lower Bound for Learning Causal DAGs with Atomic Interventions '
  abstract: ' A well-studied challenge that arises in the structure learning problem of causal directed acyclic graphs (DAG) is that using observational data, one can only learn the graph up to a "Markov equivalence class" (MEC). The remaining undirected edges have to be oriented using interventions, which can be very expensive to perform in applications. Thus, the problem of minimizing the number of interventions needed to fully orient the MEC has received a lot of recent attention, and is also the focus of this work. We prove two main results. The first is a new universal lower bound on the number of atomic interventions that any algorithm (whether active or passive) would need to perform in order to orient a given MEC. Our second result shows that this bound is, in fact, within a factor of two of the size of the smallest set of atomic interventions that can orient the MEC. Our lower bound is provably better than previously known lower bounds. The proof of our lower bound is based on the new notion of clique-block shared-parents (CBSP) orderings, which are topological orderings of DAGs without v-structures and satisfy certain special properties. Further, using simulations on synthetic graphs and by giving examples of special graph families, we show that our bound is often significantly better. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/porwal22a.html
  PDF: https://proceedings.mlr.press/v151/porwal22a/porwal22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-porwal22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Vibhor
    family: Porwal
  - given: Piyush
    family: Srivastava
  - given: Gaurav
    family: Sinha
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5583-5603
  id: porwal22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5583
  lastpage: 5603
  published: 2022-05-03 00:00:00 +0000
- title: ' Estimators of Entropy and Information via Inference in Probabilistic Models '
  abstract: ' Estimating information-theoretic quantities such as entropy and mutual information is central to many problems in statistics and machine learning, but challenging in high dimensions. This paper presents <em>estimators of entropy via inference</em> (EEVI), which deliver upper and lower bounds on many information quantities for arbitrary variables in a probabilistic generative model. These estimators use importance sampling with proposal distribution families that include amortized variational inference and sequential Monte Carlo, which can be tailored to the target model and used to squeeze true information values with high accuracy. We present several theoretical properties of EEVI and demonstrate scalability and efficacy on two problems from the medical domain: (i) in an expert system for diagnosing liver disorders, we rank medical tests according to how informative they are about latent diseases, given a pattern of observed symptoms and patient attributes; and (ii) in a differential equation model of carbohydrate metabolism, we find optimal times to take blood glucose measurements that maximize information about a diabetic patient’s insulin sensitivity, given their meal and medication schedule. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/saad22a.html
  PDF: https://proceedings.mlr.press/v151/saad22a/saad22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-saad22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Feras
    family: Saad
  - given: Marco
    family: Cusumano-Towner
  - given: Vikash
    family: Mansinghka
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5604-5621
  id: saad22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5604
  lastpage: 5621
  published: 2022-05-03 00:00:00 +0000
- title: ' Adaptive Private-K-Selection with Adaptive K and Application to Multi-label PATE '
  abstract: ' We provide an end-to-end Renyi DP based-framework for differentially private top-$k$ selection. Unlike previous approaches, which require a data-independent choice on $k$, we propose to privately release a data-dependent choice of $k$ such that the gap between $k$-th and the $(k+1)$st “quality” is large. This is achieved by an extension of the Report-Noisy-Max algorithm with a more concentrated Gaussian noise. Not only does this eliminates one hyperparameter, the adaptive choice of $k$ also certifies the stability of the top-$k$ indices in the unordered set so we can release them using a combination of the propose-test-release (PTR) framework and the Distance-to-Stability mechanism. We show that our construction improves the privacy-utility trade-offs compared to the previous top-$k$ selection algorithms theoretically and empirically. Additionally, we apply our algorithm to “Private Aggregation of Teacher Ensembles (PATE)” in multi-label classification tasks with a large number of labels and show that it leads to significant performance gains. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/zhu22e.html
  PDF: https://proceedings.mlr.press/v151/zhu22e/zhu22e.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-zhu22e.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yuqing
    family: Zhu
  - given: Yu-Xiang
    family: Wang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5622-5635
  id: zhu22e
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5622
  lastpage: 5635
  published: 2022-05-03 00:00:00 +0000
- title: ' Fast and accurate optimization on the orthogonal manifold without retraction '
  abstract: ' We consider the problem of minimizing a function over the manifold of orthogonal matrices. The majority of algorithms for this problem compute a direction in the tangent space, and then use a retraction to move in that direction while staying on the manifold. Unfortunately, the numerical computation of retractions on the orthogonal manifold always involves some expensive linear algebra operation, such as matrix inversion, exponential or square-root. These operations quickly become expensive as the dimension of the matrices grows. To bypass this limitation, we propose the landing algorithm which does not use retractions. The algorithm is not constrained to stay on the manifold but its evolution is driven by a potential energy which progressively attracts it towards the manifold. One iteration of the landing algorithm only involves matrix multiplications, which makes it cheap compared to its retraction counterparts. We provide an analysis of the convergence of the algorithm, and demonstrate its promises on large-scale and deep learning problems, where it is faster and less prone to numerical errors than retraction-based methods. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ablin22a.html
  PDF: https://proceedings.mlr.press/v151/ablin22a/ablin22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ablin22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Pierre
    family: Ablin
  - given: Gabriel
    family: Peyré
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5636-5657
  id: ablin22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5636
  lastpage: 5657
  published: 2022-05-03 00:00:00 +0000
- title: ' Beyond the Policy Gradient Theorem for Efficient Policy Updates in Actor-Critic Algorithms '
  abstract: ' In Reinforcement Learning, the optimal action at a given state is dependent on policy decisions at subsequent states. As a consequence, the learning targets evolve with time and the policy optimization process must be efficient at unlearning what it previously learnt. In this paper, we discover that the policy gradient theorem prescribes policy updates that are slow to unlearn because of their structural symmetry with respect to the value target. To increase the unlearning speed, we study a novel policy update: the gradient of the cross-entropy loss with respect to the action maximizing $q$, but find that such updates may lead to a decrease in value. Consequently, we introduce a modified policy update devoid of that flaw, and prove its guarantees of convergence to global optimality in $\mathcal{O}(t^{-1})$ under classic assumptions. Further, we assess standard policy updates and our cross-entropy policy updates along six analytical dimensions. Finally, we empirically validate our theoretical findings. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/laroche22a.html
  PDF: https://proceedings.mlr.press/v151/laroche22a/laroche22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-laroche22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Romain
    family: Laroche
  - given: Remi
    family: Tachet Des Combes
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5658-5688
  id: laroche22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5658
  lastpage: 5688
  published: 2022-05-03 00:00:00 +0000
- title: ' Efficient Kernelized UCB for Contextual Bandits '
  abstract: ' In this paper, we tackle the computational efficiency of kernelized UCB algorithms in contextual bandits. While standard methods require a $\mathcal{O}(CT^3)$ complexity where $T$ is the horizon and the constant $C$ is related to optimizing the UCB rule, we propose an efficient contextual algorithm for large-scale problems. Specifically, our method relies on incremental Nyström approximations of the joint kernel embedding of contexts and actions. This allows us to achieve a complexity of $\mathcal{O}(CTm^2)$ where $m$ is the number of Nyström points. To recover the same regret as the standard kernelized UCB algorithm, $m$ needs to be of order of the effective dimension of the problem, which is at most $\mathcal{O}(\sqrt{T})$ and nearly constant in some cases. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/zenati22a.html
  PDF: https://proceedings.mlr.press/v151/zenati22a/zenati22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-zenati22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Houssam
    family: Zenati
  - given: Alberto
    family: Bietti
  - given: Eustache
    family: Diemert
  - given: Julien
    family: Mairal
  - given: Matthieu
    family: Martin
  - given: Pierre
    family: Gaillard
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5689-5720
  id: zenati22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5689
  lastpage: 5720
  published: 2022-05-03 00:00:00 +0000
- title: ' Acceleration in Distributed Optimization under Similarity '
  abstract: ' We study distributed (strongly convex) optimization problems over a network of agents, with no centralized nodes. The loss functions of the agents are assumed to be similar, due to statistical data similarity or otherwise. In order to reduce the number of communications to reach a solution accuracy, we proposed a preconditioned, accelerated distributed method. An $\varepsilon$-solution is achieved in $\tilde{\mathcal{O}}\big(\sqrt{\frac{\beta/\mu}{1-\rho}}\log1/\varepsilon\big)$ number of communications steps, where $\beta/\mu$ is the relative condition number between the global and local loss functions, and $\rho$ characterizes the connectivity of the network. This rate matches (up to poly-log factors) lower complexity communication bounds of distributed gossip-algorithms applied to the class of problems of interest. Numerical results show significant communication savings with respect to existing accelerated distributed schemes, especially when solving ill-conditioned problems. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/tian22b.html
  PDF: https://proceedings.mlr.press/v151/tian22b/tian22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-tian22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ye
    family: Tian
  - given: Gesualdo
    family: Scutari
  - given: Tianyu
    family: Cao
  - given: Alexander
    family: Gasnikov
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5721-5756
  id: tian22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5721
  lastpage: 5756
  published: 2022-05-03 00:00:00 +0000
- title: ' Corruption-robust Offline Reinforcement Learning '
  abstract: ' We study the adversarial robustness in offline reinforcement learning. Given a batch dataset consisting of tuples $(s, a, r, s’)$, an adversary is allowed to arbitrarily modify $\epsilon$ fraction of the tuples. From the corrupted dataset the learner aims to robustly identify a near-optimal policy. We first show that a worst-case $\Omega(d\epsilon)$ optimality gap is unavoidable in linear MDP of dimension $d$, even if the adversary only corrupts the reward element in a tuple. This contrasts with dimension-free results in robust supervised learning and best-known lower-bound in the online RL setting with corruption. Next, we propose robust variants of the Least-Square Value Iteration (LSVI) algorithm utilizing robust supervised learning oracles, which achieve near-matching performances in cases both with and without full data coverage. The algorithm requires the knowledge of $\epsilon$ to design the pessimism bonus in the no-coverage case. Surprisingly, in this case, the knowledge of $\epsilon$ is necessary, as we show that being adaptive to unknown $\epsilon$ is impossible. This again contrasts with recent results on corruption-robust online RL and implies that robust offline RL is a strictly harder problem. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/zhang22c.html
  PDF: https://proceedings.mlr.press/v151/zhang22c/zhang22c.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-zhang22c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Xuezhou
    family: Zhang
  - given: Yiding
    family: Chen
  - given: Xiaojin
    family: Zhu
  - given: Wen
    family: Sun
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5757-5773
  id: zhang22c
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5757
  lastpage: 5773
  published: 2022-05-03 00:00:00 +0000
- title: ' A Cramér Distance perspective on Quantile Regression based Distributional Reinforcement Learning '
  abstract: ' Distributional reinforcement learning (DRL) extends the value-based approach by approximating the full distribution over future returns instead of the mean only, providing a richer signal that leads to improved performances. Quantile Regression (QR)-based methods like QR-DQN project arbitrary distributions into a parametric subset of staircase distributions by minimizing the 1-Wasserstein distance. However, due to biases in the gradients, the quantile regression loss is used instead for training, guaranteeing the same minimizer and enjoying unbiased gradients. Non-crossing constraints on the quantiles have been shown to improve the performance of QR-DQN for uncertainty-based exploration strategies. The contribution of this work is in the setting of fixed quantile levels and is twofold. First, we prove that the Cramer distance yields a projection that coincides with the 1-Wasserstein one and that, under non-crossing constraints, the squared Cramer and the quantile regression losses yield collinear gradients, shedding light on the connection between these important elements of DRL. Second, we propose a low complexity algorithm to compute the Cramer distance. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/lheritier22a.html
  PDF: https://proceedings.mlr.press/v151/lheritier22a/lheritier22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-lheritier22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Alix
    family: Lheritier
  - given: Nicolas
    family: Bondoux
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5774-5789
  id: lheritier22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5774
  lastpage: 5789
  published: 2022-05-03 00:00:00 +0000
- title: ' Orbital MCMC '
  abstract: ' Markov Chain Monte Carlo (MCMC) algorithms ubiquitously employ complex deterministic transformations to generate proposal points that are then filtered by the Metropolis-Hastings-Green (MHG) test. However, the condition of the target measure invariance puts restrictions on the design of these transformations. In this paper, we first derive the acceptance test for the stochastic Markov kernel considering arbitrary deterministic maps as proposal generators. When applied to the transformations with orbits of period two (involutions), the test reduces to the MHG test. Based on the derived test we propose two practical algorithms: one operates by constructing periodic orbits from any diffeomorphism, another on contractions of the state space (such as optimization trajectories). Finally, we perform an empirical study demonstrating the practical advantages of both kernels. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/neklyudov22a.html
  PDF: https://proceedings.mlr.press/v151/neklyudov22a/neklyudov22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-neklyudov22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Kirill
    family: Neklyudov
  - given: Max
    family: Welling
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5790-5814
  id: neklyudov22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5790
  lastpage: 5814
  published: 2022-05-03 00:00:00 +0000
- title: ' PAC Mode Estimation using PPR Martingale Confidence Sequences '
  abstract: ' We consider the problem of correctly identifying the mode of a discrete distribution $\mathcal{P}$ with sufficiently high probability by observing a sequence of i.i.d. samples drawn from $\mathcal{P}$. This problem reduces to the estimation of a single parameter when $\mathcal{P}$ has a support set of size $K = 2$. After noting that this special case is handled very well by prior-posterior-ratio (PPR) martingale confidence sequences (Waudby-Smith and Ramdas, 2020), we propose a generalisation to mode estimation, in which $\mathcal{P}$ may take $K \geq 2$ values. To begin, we show that the "one-versus-one" principle to generalise from $K = 2$ to $K \geq 2$ classes is more efficient than the "one-versus-rest" alternative. We then prove that our resulting stopping rule, denoted PPR-1v1, is asymptotically optimal (as the mistake probability is taken to 0). PPR-1v1 is simple and computationally light, and incurs significantly fewer samples than competitors even in the non-asymptotic regime. We demonstrate its gains in two practical applications of sampling: election forecasting and verification of smart contracts in blockchains. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/anand-jain22a.html
  PDF: https://proceedings.mlr.press/v151/anand-jain22a/anand-jain22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-anand-jain22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Shubham
    family: Anand Jain
  - given: Rohan
    family: Shah
  - given: Sanit
    family: Gupta
  - given: Denil
    family: Mehta
  - given: Inderjeet J.
    family: Nair
  - given: Jian
    family: Vora
  - given: Sushil
    family: Khyalia
  - given: Sourav
    family: Das
  - given: Vinay J.
    family: Ribeiro
  - given: Shivaram
    family: Kalyanakrishnan
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5815-5852
  id: anand-jain22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5815
  lastpage: 5852
  published: 2022-05-03 00:00:00 +0000
- title: ' Harmless interpolation in regression and classification with structured features '
  abstract: ' Overparametrized neural networks tend to perfectly fit noisy training data yet generalize well on test data. Inspired by this empirical observation, recent work has sought to understand this phenomenon of benign overfitting or harmless interpolation in the much simpler linear model. Previous theoretical work critically assumes that either the data features are statistically independent or the input data is high-dimensional; this precludes general nonparametric settings with structured feature maps. In this paper, we present a general and flexible framework for upper bounding regression and classification risk in a reproducing kernel Hilbert space. A key contribution is that our framework describes precise sufficient conditions on the data Gram matrix under which harmless interpolation occurs. Our results recover prior independent-features results (with a much simpler analysis), but they furthermore show that harmless interpolation can occur in more general settings such as features that are a bounded orthonormal system. Furthermore, our results show an asymptotic separation between classification and regression performance in a manner that was previously only shown for Gaussian features. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/mcrae22a.html
  PDF: https://proceedings.mlr.press/v151/mcrae22a/mcrae22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-mcrae22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Andrew D.
    family: Mcrae
  - given: Santhosh
    family: Karnik
  - given: Mark
    family: Davenport
  - given: Vidya K.
    family: Muthukumar
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5853-5875
  id: mcrae22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5853
  lastpage: 5875
  published: 2022-05-03 00:00:00 +0000
- title: ' Masked Training of Neural Networks with Partial Gradients '
  abstract: ' State-of-the-art training algorithms for deep learning models are based on stochastic gradient descent (SGD). Recently, many variations have been explored: perturbing parameters for better accuracy (such as in Extragradient), limiting SGD updates to a subset of parameters for increased efficiency (such as meProp) or a combination of both (such as Dropout). However, the convergence of these methods is often not studied in theory. We propose a unified theoretical framework to study such SGD variants—encompassing the aforementioned algorithms and additionally a broad variety of methods used for communication efficient training or model compression. Our insights can be used as a guide to improve the efficiency of such methods and facilitate generalization to new applications. As an example, we tackle the task of jointly training networks, a version of which (limited to sub-networks) is used to create Slimmable Networks. By training a low-rank Transformer jointly with a standard one we obtain superior performance than when it is trained separately. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/mohtashami22a.html
  PDF: https://proceedings.mlr.press/v151/mohtashami22a/mohtashami22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-mohtashami22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Amirkeivan
    family: Mohtashami
  - given: Martin
    family: Jaggi
  - given: Sebastian
    family: Stich
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5876-5890
  id: mohtashami22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5876
  lastpage: 5890
  published: 2022-05-03 00:00:00 +0000
- title: ' Learning in Stochastic Monotone Games with Decision-Dependent Data '
  abstract: ' Learning problems commonly exhibit an interesting feedback mechanism wherein the population data reacts to competing decision makers’ actions. This paper formulates a new game theoretic framework for this phenomenon, called multi-player performative prediction. We establish transparent sufficient conditions for strong monotonicity of the game and use them to develop algorithms for finding Nash equilibria. We investigate derivative free methods and adaptive gradient algorithms wherein each player alternates between learning a parametric description of their distribution and gradient steps on the empirical risk. Synthetic and semi-synthetic numerical experiments illustrate the results. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/narang22a.html
  PDF: https://proceedings.mlr.press/v151/narang22a/narang22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-narang22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Adhyyan
    family: Narang
  - given: Evan
    family: Faulkner
  - given: Dmitriy
    family: Drusvyatskiy
  - given: Maryam
    family: Fazel
  - given: Lillian
    family: Ratliff
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5891-5912
  id: narang22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5891
  lastpage: 5912
  published: 2022-05-03 00:00:00 +0000
- title: ' On Combining Bags to Better Learn from Label Proportions '
  abstract: ' In the framework of learning from label proportions (LLP) the goal is to learn a good instance-level label predictor from the observed label proportions of bags of instances. Most of the LLP algorithms either explicitly or implicitly assume the nature of bag distributions with respect to the actual labels and instances, or cleverly adapt supervised learning techniques to suit LLP. In practical applications however, the scale and nature of data could render such assumptions invalid and the many of the algorithms impractical. In this paper we address the hard problem of solving LLP with provable error bounds while being bag distribution agnostic and model agnostic. We first propose the concept of generalized bags, an extension of bags and then devise an algorithm to combine bag distributions, if possible, into good generalized bag distributions. We show that (w.h.p) any classifier optimizing the squared Euclidean label-proportion loss on such a generalized bag distribution is guaranteed to minimize the instance-level loss as well. The predictive quality of our method is experimentally evaluated and it equals or betters the previous methods on pseudo-synthetic and real-world datasets. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/saket22a.html
  PDF: https://proceedings.mlr.press/v151/saket22a/saket22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-saket22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Rishi
    family: Saket
  - given: Aravindan
    family: Raghuveer
  - given: Balaraman
    family: Ravindran
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5913-5927
  id: saket22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5913
  lastpage: 5927
  published: 2022-05-03 00:00:00 +0000
- title: ' Meta Learning MDPs with linear transition models '
  abstract: ' We study meta-learning in Markov Decision Processes (MDP) with linear transition models in the undiscounted episodic setting. Under a task sharedness metric based on model proximity we study task families characterized by a distribution over models specified by a bias term and a variance component. We then propose BUC-MatrixRL, a version of the UC-Matrix RL algorithm and show it can meaningfully leverage a set of sampled training tasks to quickly solve a test task sampled from the same task distribution by learning an estimator of the bias parameter of the task distribution. The analysis leverages and extends results in the learning to learn linear regression and linear bandit setting to the more general case of MDP’s with linear transition models. We prove that compared to learning the tasks in isolation, BUC-Matrix RL provides significant improvements in the transfer regret for high bias low variance task distributions. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/muller22a.html
  PDF: https://proceedings.mlr.press/v151/muller22a/muller22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-muller22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Robert
    family: Müller
  - given: Aldo
    family: Pacchiano
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5928-5948
  id: muller22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5928
  lastpage: 5948
  published: 2022-05-03 00:00:00 +0000
- title: ' Adaptation of the Independent Metropolis-Hastings Sampler with Normalizing Flow Proposals '
  abstract: ' Markov Chain Monte Carlo (MCMC) methods are a powerful tool for computation with complex probability distributions. However the performance of such methods is critically dependent on properly tuned parameters, most of which are difficult if not impossible to know a priori for a given target distribution. Adaptive MCMC methods aim to address this by allowing the parameters to be updated during sampling based on previous samples from the chain at the expense of requiring a new theoretical analysis to ensure convergence. In this work we extend the convergence theory of adaptive MCMC methods to a new class of methods built on a powerful class of parametric density estimators known as normalizing flows. In particular, we consider an independent Metropolis-Hastings sampler where the proposal distribution is represented by a normalizing flow whose parameters are updated using stochastic gradient descent. We explore the practical performance of this procedure on both synthetic settings and in the analysis of a physical field system, and compare it against both adaptive and non-adaptive MCMC methods. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/brofos22a.html
  PDF: https://proceedings.mlr.press/v151/brofos22a/brofos22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-brofos22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: James
    family: Brofos
  - given: Marylou
    family: Gabrie
  - given: Marcus A.
    family: Brubaker
  - given: Roy R.
    family: Lederman
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5949-5986
  id: brofos22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5949
  lastpage: 5986
  published: 2022-05-03 00:00:00 +0000
- title: ' Permutation Equivariant Layers for Higher Order Interactions '
  abstract: ' Recent work on permutation equivariant neural networks has mostly focused on the first order case (sets) and second order case (graphs). We describe the machinery for generalizing permutation equivariance to arbitrary $k$-ary interactions between entities for any value of $k$. We demonstrate the effectiveness of higher order permutation equivariant models on several real world applications and find that our results compare favorably to existing permutation invariant/equivariant baselines. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/pan22a.html
  PDF: https://proceedings.mlr.press/v151/pan22a/pan22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-pan22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Horace
    family: Pan
  - given: Risi
    family: Kondor
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 5987-6001
  id: pan22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 5987
  lastpage: 6001
  published: 2022-05-03 00:00:00 +0000
- title: ' Efficient Online Bayesian Inference for Neural Bandits '
  abstract: ' In this paper we present a new algorithm for online (sequential) inference in Bayesian neural networks, and show its suitability for tackling contextual bandit problems. The key idea is to combine the extended Kalman filter (which locally linearizes the likelihood function at each time step) with a (learned or random) low-dimensional affine subspace for the parameters; the use of a subspace enables us to scale our algorithm to models with $\sim 1M$ parameters. While most other neural bandit methods need to store the entire past dataset in order to avoid the problem of “catastrophic forgetting”, our approach uses constant memory. This is possible because we represent uncertainty about all the parameters in the model, not just the final linear layer. We show good results on the “Deep Bayesian Bandit Showdown” benchmark, as well as MNIST and a recommender system. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/duran-martin22a.html
  PDF: https://proceedings.mlr.press/v151/duran-martin22a/duran-martin22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-duran-martin22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Gerardo
    family: Duran-Martin
  - given: Aleyna
    family: Kara
  - given: Kevin
    family: Murphy
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6002-6021
  id: duran-martin22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6002
  lastpage: 6021
  published: 2022-05-03 00:00:00 +0000
- title: ' Parameter-Free Online Linear Optimization with Side Information via Universal Coin Betting '
  abstract: ' A class of parameter-free online linear optimization algorithms is proposed that harnesses the structure of an adversarial sequence by adapting to some side information. These algorithms combine the reduction technique of Orabona and Pal (2016) for adapting coin betting algorithms for online linear optimization with universal compression techniques in information theory for incorporating sequential side information to coin betting. Concrete examples are studied in which the side information has a tree structure and consists of quantized values of the previous symbols of the adversarial sequence, including fixed-order and variable-order Markov cases. By modifying the context-tree weighting technique of Willems, Shtarkov, and Tjalkens (1995), the proposed algorithm is further refined to achieve the best performance over all adaptive algorithms with tree-structured side information of a given maximum order in a computationally efficient manner. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ryu22a.html
  PDF: https://proceedings.mlr.press/v151/ryu22a/ryu22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ryu22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jongha J.
    family: Ryu
  - given: Alankrita
    family: Bhatt
  - given: Young-Han
    family: Kim
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6022-6044
  id: ryu22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6022
  lastpage: 6044
  published: 2022-05-03 00:00:00 +0000
- title: ' Performative Prediction in a Stateful World '
  abstract: ' Deployed supervised machine learning models make predictions that interact with and influence the world. This phenomenon is called <em>performative prediction</em> by Perdomo et al. (ICML 2020). It is an ongoing challenge to understand the influence of such predictions as well as design tools so as to control that influence. We propose a theoretical framework where the response of a target population to the deployed classifier is modeled as a function of the classifier and the current state (distribution) of the population. We show necessary and sufficient conditions for convergence to an equilibrium of two retraining algorithms, <em>repeated risk minimization</em> and a lazier variant. Furthermore, convergence is near an optimal classifier. We thus generalize results of Perdomo et al., whose performativity framework does not assume any dependence on the state of the target population. A particular phenomenon captured by our model is that of distinct groups that acquire information and resources at different rates to be able to respond to the latest deployed classifier. We study this phenomenon theoretically and empirically. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/brown22a.html
  PDF: https://proceedings.mlr.press/v151/brown22a/brown22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-brown22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Gavin
    family: Brown
  - given: Shlomi
    family: Hod
  - given: Iden
    family: Kalemaj
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6045-6061
  id: brown22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6045
  lastpage: 6061
  published: 2022-05-03 00:00:00 +0000
- title: ' Learning Revenue-Maximizing Auctions With Differentiable Matching '
  abstract: ' We propose a new architecture to approximately learn incentive compatible, revenue-maximizing auctions from sampled valuations. Our architecture uses the Sinkhorn algorithm to perform a differentiable bipartite matching which allows the network to learn strategyproof revenue-maximizing mechanisms in settings not learnable by the previous RegretNet architecture. In particular, our architecture is able to learn mechanisms in settings without free disposal where each bidder must be allocated exactly some number of items. In experiments, we show our approach successfully recovers multiple known optimal mechanisms and high-revenue, low-regret mechanisms in larger settings where the optimal mechanism is unknown. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/curry22a.html
  PDF: https://proceedings.mlr.press/v151/curry22a/curry22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-curry22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Michael J.
    family: Curry
  - given: Uro
    family: Lyi
  - given: Tom
    family: Goldstein
  - given: John P.
    family: Dickerson
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6062-6073
  id: curry22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6062
  lastpage: 6073
  published: 2022-05-03 00:00:00 +0000
- title: ' Orthogonal Multi-Manifold Enriching of Directed Networks '
  abstract: ' Directed Acyclic Graphs and trees are widely prevalent in several real-world applications. These hierarchical structures show intriguing properties such as scale-free and bipartite nature, with fine-grained temporal irregularities among nodes. Building on advances in geometrical deep learning, we explore a time-aware neural network to model trees and Directed Acyclic Graphs in multiple Riemannian manifolds of varying curvatures. To jointly utilize the strength of these manifolds, we propose <b>M</b>ulti-Manifold <b>R</b>ecursive <b>I</b>nteraction <b>L</b>earning (<b>MRIL</b>) on Directed Acyclic Graphs where we introduce an inter-manifold learning mechanism that recursively enriches each manifold with representations from sibling manifolds. We propose the integration of the Stiefel orthogonality constraint which stabilizes the training process in Riemannian manifolds. Through a series of quantitative and exploratory experiments, we show that our method achieves competitive performance and converges much faster on data spanning several domains. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/sawhney22a.html
  PDF: https://proceedings.mlr.press/v151/sawhney22a/sawhney22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-sawhney22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ramit
    family: Sawhney
  - given: Shivam
    family: Agarwal
  - given: Atula T.
    family: Neerkaje
  - given: Kapil
    family: Jayesh Pathak
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6074-6086
  id: sawhney22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6074
  lastpage: 6086
  published: 2022-05-03 00:00:00 +0000
- title: ' Estimating Functionals of the Out-of-Sample Error Distribution in High-Dimensional Ridge Regression '
  abstract: ' We study the problem of estimating the distribution of the out-of-sample prediction error associated with ridge regression. In contrast, the traditional object of study is the uncentered second moment of this distribution (the mean squared prediction error), which can be estimated using cross-validation methods. We show that both generalized and leave-one-out cross-validation (GCV and LOOCV) for ridge regression can be suitably extended to estimate the full error distribution. This is still possible in a high-dimensional setting where the ridge regularization parameter is zero. In an asymptotic framework in which the feature dimension and sample size grow proportionally, we prove that almost surely, with respect to the training data, our estimators (extensions of GCV and LOOCV) converge weakly to the true out-of-sample error distribution. This result requires mild assumptions on the response and feature distributions. We also establish a more general result that allows us to estimate certain functionals of the error distribution, both linear and nonlinear. This yields various applications, including consistent estimation of the quantiles of the out-of-sample error distribution, which gives rise to prediction intervals with asymptotically exact coverage conditional on the training data. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/patil22a.html
  PDF: https://proceedings.mlr.press/v151/patil22a/patil22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-patil22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Pratik
    family: Patil
  - given: Alessandro
    family: Rinaldo
  - given: Ryan
    family: Tibshirani
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6087-6120
  id: patil22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6087
  lastpage: 6120
  published: 2022-05-03 00:00:00 +0000
- title: ' The Tree Loss: Improving Generalization with Many Classes '
  abstract: ' Multi-class classification problems often have many semantically similar classes. For example, 90 of ImageNet’s 1000 classes are for different breeds of dog. We should expect that these semantically similar classes will have similar parameter vectors, but the standard cross entropy loss does not enforce this constraint. We introduce the tree loss as a drop-in replacement for the cross entropy loss. The tree loss re-parameterizes the parameter matrix in order to guarantee that semantically similar classes will have similar parameter vectors. Using simple properties of stochastic gradient descent, we show that the tree loss’s generalization error is asymptotically better than the cross entropy loss’s. We then validate these theoretical results on synthetic data, image data (CIFAR100, ImageNet), and text data (Twitter). '
  volume: 151
  URL: https://proceedings.mlr.press/v151/wang22d.html
  PDF: https://proceedings.mlr.press/v151/wang22d/wang22d.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-wang22d.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yujie
    family: Wang
  - given: Mike
    family: Izbicki
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6121-6133
  id: wang22d
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6121
  lastpage: 6133
  published: 2022-05-03 00:00:00 +0000
- title: ' Double Control Variates for Gradient Estimation in Discrete Latent Variable Models '
  abstract: ' Stochastic gradient-based optimisation for discrete latent variable models is challenging due to the high variance of gradients. We introduce a variance reduction technique for score function estimators that makes use of double control variates. These control variates act on top of a main control variate, and try to further reduce the variance of the overall estimator. We develop a double control variate for the REINFORCE leave-one-out estimator using Taylor expansions. For training discrete latent variable models, such as variational autoencoders with binary latent variables, our approach adds no extra computational cost compared to standard training with the REINFORCE leave-one-out estimator. We apply our method to challenging high-dimensional toy examples and for training variational autoencoders with binary latent variables. We show that our estimator can have lower variance compared to other state-of-the-art estimators. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/titsias22a.html
  PDF: https://proceedings.mlr.press/v151/titsias22a/titsias22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-titsias22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Michalis
    family: Titsias
  - given: Jiaxin
    family: Shi
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6134-6151
  id: titsias22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6134
  lastpage: 6151
  published: 2022-05-03 00:00:00 +0000
- title: ' Transfer Learning with Gaussian Processes for Bayesian Optimization '
  abstract: ' Bayesian optimization is a powerful paradigm to optimize black-box functions based on scarce and noisy data. Its data efficiency can be further improved by transfer learning from related tasks. While recent transfer models meta-learn a prior based on large amount of data, in the low-data regime methods that exploit the closed-form posterior of Gaussian processes (GPs) have an advantage. In this setting, several analytically tractable transfer-model posteriors have been proposed, but the relative advantages of these methods are not well understood. In this paper, we provide a unified view on hierarchical GP models for transfer learning, which allows us to analyze the relationship between methods. As part of the analysis, we develop a novel closed-form boosted GP transfer model that fits between existing approaches in terms of complexity. We evaluate the performance of the different approaches in large-scale experiments and highlight strengths and weaknesses of the different transfer-learning methods. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/tighineanu22a.html
  PDF: https://proceedings.mlr.press/v151/tighineanu22a/tighineanu22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-tighineanu22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Petru
    family: Tighineanu
  - given: Kathrin
    family: Skubch
  - given: Paul
    family: Baireuther
  - given: Attila
    family: Reiss
  - given: Felix
    family: Berkenkamp
  - given: Julia
    family: Vinogradska
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6152-6181
  id: tighineanu22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6152
  lastpage: 6181
  published: 2022-05-03 00:00:00 +0000
- title: ' Conditional Linear Regression for Heterogeneous Covariances '
  abstract: ' Often machine learning and statistical models will attempt to describe the majority of the data. However, there may be situations where only a fraction of the data can be fit well by a linear regression model. Here, we are interested in a case where such inliers can be identified by a Disjunctive Normal Form (DNF) formula. We give a polynomial time algorithm for the conditional linear regression task, which identifies a DNF condition together with the linear predictor on the corresponding portion of the data. In this work, we improve on previous algorithms by removing a requirement that the covariances of the data satisfying each of the terms of the condition have to all be very similar in spectral norm to the covariance of the overall condition. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/liang22a.html
  PDF: https://proceedings.mlr.press/v151/liang22a/liang22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-liang22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Leda
    family: Liang
  - given: Brendan
    family: Juba
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6182-6199
  id: liang22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6182
  lastpage: 6199
  published: 2022-05-03 00:00:00 +0000
- title: ' Learning Competitive Equilibria in Exchange Economies with Bandit Feedback '
  abstract: ' The sharing of scarce resources among multiple rational agents is one of the classical problems in economics. In exchange economies, which are used to model such situations, agents begin with an initial endowment of resources and exchange them in a way that is mutually beneficial until they reach a competitive equilibrium (CE). The allocations at a CE are Pareto efficient and fair. Consequently, they are used widely in designing mechanisms for fair division. However, computing CEs requires the knowledge of agent preferences which are unknown in several applications of interest. In this work, we explore a new online learning mechanism, which, on each round, allocates resources to the agents and collects stochastic feedback on their experience in using that allocation. Its goal is to learn the agent utilities via this feedback and imitate the allocations at a CE in the long run. We quantify CE behavior via two losses and propose a randomized algorithm which achieves sublinear loss under a parametric class of utilities. Empirically, we demonstrate the effectiveness of this mechanism through numerical simulations. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/guo22a.html
  PDF: https://proceedings.mlr.press/v151/guo22a/guo22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-guo22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Wenshuo
    family: Guo
  - given: Kirthevasan
    family: Kandasamy
  - given: Joseph
    family: Gonzalez
  - given: Michael
    family: Jordan
  - given: Ion
    family: Stoica
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6200-6224
  id: guo22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6200
  lastpage: 6224
  published: 2022-05-03 00:00:00 +0000
- title: ' Node Feature Kernels Increase Graph Convolutional Network Robustness '
  abstract: ' The robustness of the much used Graph Convolutional Networks (GCNs) to perturbations of their input is becoming a topic of increasing importance. In this paper the random GCN is introduced for which a random matrix theory analysis is possible. This analysis suggests that if the graph is sufficiently perturbed, or in the extreme case random, then the GCN fails to benefit from the node features. It is furthermore observed that enhancing the message passing step in GCNs by adding the node feature kernel to the adjacency matrix of the graph structure solves this problem. An empirical study of a GCN utilised for node classification on six real datasets further confirms the theoretical findings and demonstrates that perturbations of the graph structure can result in GCNs performing significantly worse than Multi-Layer Perceptrons run on the node features alone. In practice, adding a node feature kernel to the message passing of perturbed graphs results in a significant improvement of the GCN’s performance, thereby rendering it more robust to graph perturbations. Our code is publicly available at: https://github.com/ChangminWu/RobustGCN. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/el-amine-seddik22a.html
  PDF: https://proceedings.mlr.press/v151/el-amine-seddik22a/el-amine-seddik22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-el-amine-seddik22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Mohamed
    family: El Amine Seddik
  - given: Changmin
    family: Wu
  - given: Johannes F.
    family: Lutzeyer
  - given: Michalis
    family: Vazirgiannis
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6225-6241
  id: el-amine-seddik22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6225
  lastpage: 6241
  published: 2022-05-03 00:00:00 +0000
- title: ' Top K Ranking for Multi-Armed Bandit with Noisy Evaluations '
  abstract: ' We consider a multi-armed bandit setting where, at the beginning of each round, the learner receives noisy independent, and possibly biased, evaluations of the true reward of each arm and it selects $K$ arms with the objective of accumulating as much reward as possible over $T$ rounds. Under the assumption that at each round the true reward of each arm is drawn from a fixed distribution, we derive different algorithmic approaches and theoretical guarantees depending on how the evaluations are generated. First, we show a $\widetilde{O}(T^{2/3})$ regret in the general case when the observation functions are a genearalized linear function of the true rewards. On the other hand, we show that an improved $\widetilde{O}(\sqrt{T})$ regret can be derived when the observation functions are noisy linear functions of the true rewards. Finally, we report an empirical validation that confirms our theoretical findings, provides a thorough comparison to alternative approaches, and further supports the interest of this setting in practice. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/garcelon22b.html
  PDF: https://proceedings.mlr.press/v151/garcelon22b/garcelon22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-garcelon22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Evrard
    family: Garcelon
  - given: Vashist
    family: Avadhanula
  - given: Alessandro
    family: Lazaric
  - given: Matteo
    family: Pirotta
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6242-6269
  id: garcelon22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6242
  lastpage: 6269
  published: 2022-05-03 00:00:00 +0000
- title: ' Differential privacy for symmetric log-concave mechanisms '
  abstract: ' Adding random noise to database query results is an important tool for achieving privacy. A challenge is to minimize this noise while still meeting privacy requirements. Recently, a sufficient and necessary condition for $(\epsilon, \delta)$-differential privacy for Gaussian noise was published. This condition allows the computation of the minimum privacy-preserving scale for this distribution. We extend this work and provide a sufficient and necessary condition for $(\epsilon, \delta)$-differential privacy for all symmetric and log-concave noise densities. Our results allow fine-grained tailoring of the noise distribution to the dimensionality of the query result. We demonstrate that this can yield significantly lower mean squared errors than those incurred by the currently used Laplace and Gaussian mechanisms for the same $\epsilon$ and $\delta$. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/vinterbo22a.html
  PDF: https://proceedings.mlr.press/v151/vinterbo22a/vinterbo22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-vinterbo22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Staal A.
    family: Vinterbo
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6270-6291
  id: vinterbo22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6270
  lastpage: 6291
  published: 2022-05-03 00:00:00 +0000
- title: ' Communication-Compressed Adaptive Gradient Method for Distributed Nonconvex Optimization '
  abstract: ' Due to the explosion in the size of the training datasets, distributed learning has received growing interest in recent years. One of the major bottlenecks is the large communication cost between the central server and the local workers. While error feedback compression has been proven to be successful in reducing communication costs with stochastic gradient descent (SGD), there are much fewer attempts in building communication-efficient adaptive gradient methods with provable guarantees, which are widely used in training large-scale machine learning models. In this paper, we propose a new communication-compressed AMSGrad for distributed nonconvex optimization problem, which is provably efficient. Our proposed distributed learning framework features an effective gradient compression strategy and a worker-side model update design. We prove that the proposed communication-efficient distributed adaptive gradient method converges to the first-order stationary point with the same iteration complexity as uncompressed vanilla AMSGrad in the stochastic nonconvex optimization setting. Experiments on various benchmarks back up our theory. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/wang22e.html
  PDF: https://proceedings.mlr.press/v151/wang22e/wang22e.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-wang22e.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yujia
    family: Wang
  - given: Lu
    family: Lin
  - given: Jinghui
    family: Chen
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6292-6320
  id: wang22e
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6292
  lastpage: 6320
  published: 2022-05-03 00:00:00 +0000
- title: ' Fair Disaster Containment via Graph-Cut Problems '
  abstract: ' Graph cut problems are fundamental in combinatorial Optimization, and are a central object of study in both theory and practice. Further, the study of fairness in Algorithmic Design and Machine Learning has recently received significant attention, with many different notions proposed and analyzed for a variety of contexts. In this paper we initiate the study of fairness for graph cut problems by giving the first fair definitions for them, and subsequently we demonstrate appropriate algorithmic techniques that yield a rigorous theoretical analysis. Specifically, we incorporate two different notions of fairness, namely demographic and probabilistic individual fairness, in a particular cut problem that models disaster containment scenarios. Our results include a variety of approximation algorithms with provable theoretical guarantees. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/dinitz22a.html
  PDF: https://proceedings.mlr.press/v151/dinitz22a/dinitz22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-dinitz22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Michael
    family: Dinitz
  - given: Aravind
    family: Srinivasan
  - given: Leonidas
    family: Tsepenekas
  - given: Anil
    family: Vullikanti
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6321-6333
  id: dinitz22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6321
  lastpage: 6333
  published: 2022-05-03 00:00:00 +0000
- title: ' Provable Lifelong Learning of Representations '
  abstract: ' In lifelong learning, tasks (or classes) to be learned arrive sequentially over time in arbitrary order. During training, knowledge from previous tasks can be captured and transferred to subsequent ones to improve sample efficiency. We consider the setting where all target tasks can be represented in the span of a small number of unknown linear or nonlinear features of the input data. We propose a lifelong learning algorithm that maintains and refines the internal feature representation. We prove that for any desired accuracy on all tasks, the dimension of the representation remains close to that of the underlying representation. The resulting sample complexity improves significantly on existing bounds. In the setting of linear features, our algorithm is provably efficient and the sample complexity for input dimension $d$, $m$ tasks with $k$ features up to error $\epsilon$ is $\tilde{O}(dk^{1.5}/\epsilon+km/\epsilon)$. We also prove a matching lower bound for any lifelong learning algorithm that uses a single task learner as a black box. We complement our analysis with an empirical study, including a heuristic lifelong learning algorithm for deep neural networks. Our method performs favorably on challenging realistic image datasets compared to state-of-the-art continual learning methods. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/cao22a.html
  PDF: https://proceedings.mlr.press/v151/cao22a/cao22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-cao22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Xinyuan
    family: Cao
  - given: Weiyang
    family: Liu
  - given: Santosh
    family: Vempala
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6334-6356
  id: cao22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6334
  lastpage: 6356
  published: 2022-05-03 00:00:00 +0000
- title: ' Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits '
  abstract: ' We introduce the “inverse bandit” problem of estimating the rewards of a multi-armed bandit instance from observing the learning process of a low-regret demonstrator. Existing approaches to the related problem of inverse reinforcement learning assume the execution of an optimal policy, and thereby suffer from an identifiability issue. In contrast, we propose to leverage the demonstrator’s behavior en route to optimality, and in particular, the exploration phase, for reward estimation. We begin by establishing a general information-theoretic lower bound under this paradigm that applies to any demonstrator algorithm, which characterizes a fundamental tradeoff between reward estimation and the amount of exploration of the demonstrator. Then, we develop simple and efficient reward estimators for upper-confidence-based demonstrator algorithms that attain the optimal tradeoff, showing in particular that consistent reward estimation—free of identifiability issues—is possible under our paradigm. Extensive simulations on both synthetic and semi-synthetic data corroborate our theoretical results. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/guo22b.html
  PDF: https://proceedings.mlr.press/v151/guo22b/guo22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-guo22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Wenshuo
    family: Guo
  - given: Kumar
    family: Krishna Agrawal
  - given: Aditya
    family: Grover
  - given: Vidya K.
    family: Muthukumar
  - given: Ashwin
    family: Pananjady
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6357-6386
  id: guo22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6357
  lastpage: 6386
  published: 2022-05-03 00:00:00 +0000
- title: ' A New Notion of Individually Fair Clustering: $α$-Equitable $k$-Center '
  abstract: ' Clustering is a fundamental problem in unsupervised machine learning, and due to its numerous societal implications fair variants of it have recently received significant attention. In this work we introduce a novel definition of individual fairness for clustering problems. Specifically, in our model, each point $j$ has a set of other points $\mathcal{S}_j$ that it perceives as similar to itself, and it feels that it is being fairly treated if the quality of service it receives in the solution is $\alpha$-close (in a multiplicative sense, for some given $\alpha \geq 1$) to that of the points in $\mathcal{S}_j$. We begin our study by answering questions regarding the combinatorial structure of the problem, namely for what values of $\alpha$ the problem is well-defined, and what the behavior of the Price of Fairness (PoF) for it is. For the well-defined region of $\alpha$, we provide efficient and easily-implementable approximation algorithms for the $k$-center objective, which in certain cases also enjoy bounded-PoF guarantees. We finally complement our analysis by an extensive suite of experiments that validates the effectiveness of our theoretical results. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/chakrabarti22a.html
  PDF: https://proceedings.mlr.press/v151/chakrabarti22a/chakrabarti22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-chakrabarti22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Darshan
    family: Chakrabarti
  - given: John P.
    family: Dickerson
  - given: Seyed A.
    family: Esmaeili
  - given: Aravind
    family: Srinivasan
  - given: Leonidas
    family: Tsepenekas
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6387-6408
  id: chakrabarti22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6387
  lastpage: 6408
  published: 2022-05-03 00:00:00 +0000
- title: ' Iterative Alignment Flows '
  abstract: ' The unsupervised task of aligning two or more distributions in a shared latent space has many applications including fair representations, batch effect mitigation, and unsupervised domain adaptation. Existing flow-based approaches estimate multiple flows independently, which is equivalent to learning multiple full generative models. Other approaches require adversarial learning, which can be computationally expensive and challenging to optimize. Thus, we aim to jointly align multiple distributions while avoiding adversarial learning. Inspired by efficient alignment algorithms from optimal transport (OT) theory for univariate distributions, we develop a simple iterative method to build deep and expressive flows. Our method decouples each iteration into two subproblems: 1) form a variational approximation of a distribution divergence and 2) minimize this variational approximation via closed-form invertible alignment maps based on known OT results. Our empirical results give evidence that this iterative algorithm achieves competitive distribution alignment at low computational cost while being able to naturally handle more than two distributions. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/zhou22b.html
  PDF: https://proceedings.mlr.press/v151/zhou22b/zhou22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-zhou22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Zeyu
    family: Zhou
  - given: Ziyu
    family: Gong
  - given: Pradeep
    family: Ravikumar
  - given: David I.
    family: Inouye
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6409-6444
  id: zhou22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6409
  lastpage: 6444
  published: 2022-05-03 00:00:00 +0000
- title: ' Fast Fourier Transform Reductions for Bayesian Network Inference '
  abstract: ' Bayesian Networks are useful for analyzing the properties of systems with large populations of interacting agents (e.g., in social modeling applications and distributed service applications). These networks typically have large functions (CPTs), making exact inference intractable. However, often these models have additive symmetry. In this paper we show how summation-based CPTs, especially in the presence of symmetry, can be computed efficiently through the usage of the Fast Fourier Transform (FFT). In particular, we propose an efficient method using the FFT for reducing the size of Conditional Probability Tables (CPTs) in Bayesian Networks with summation-based causal independence (CI). We then show how to apply this reduction directly towards the acceleration of Bucket Elimination, and we subsequently provide experimental results demonstrating the computational speedup provided by our method. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/hsiao22a.html
  PDF: https://proceedings.mlr.press/v151/hsiao22a/hsiao22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-hsiao22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Vincent
    family: Hsiao
  - given: Dana
    family: Nau
  - given: Rina
    family: Dechter
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6445-6458
  id: hsiao22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6445
  lastpage: 6458
  published: 2022-05-03 00:00:00 +0000
- title: ' QLSD: Quantised Langevin Stochastic Dynamics for Bayesian Federated Learning '
  abstract: ' The objective of Federated Learning (FL) is to perform statistical inference for data which are decentralised and stored locally on networked clients. FL raises many constraints which include privacy and data ownership, communication overhead, statistical heterogeneity, and partial client participation. In this paper, we address these problems in the framework of the Bayesian paradigm. To this end, we propose a novel federated Markov Chain Monte Carlo algorithm, referred to as Quantised Langevin Stochastic Dynamics which may be seen as an extension to the FL setting of Stochastic Gradient Langevin Dynamics, which handles the communication bottleneck using gradient compression. To improve performance, we then introduce variance reduction techniques, which lead to two improved versions coined QLSD$^\star$ and QLSD$^{++}$. We give both non-asymptotic and asymptotic convergence guarantees for the proposed algorithms. We illustrate their performances using various Bayesian Federated Learning benchmarks. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/vono22a.html
  PDF: https://proceedings.mlr.press/v151/vono22a/vono22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-vono22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Maxime
    family: Vono
  - given: Vincent
    family: Plassier
  - given: Alain
    family: Durmus
  - given: Aymeric
    family: Dieuleveut
  - given: Eric
    family: Moulines
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6459-6500
  id: vono22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6459
  lastpage: 6500
  published: 2022-05-03 00:00:00 +0000
- title: ' Lifted Division for Lifted Hugin Belief Propagation '
  abstract: ' The lifted junction tree algorithm (LJT) is an inference algorithm that allows for tractable inference regarding domain sizes. To answer multiple queries efficiently, it decomposes a first-order input model into a first-order junction tree. During inference, degrees of belief are propagated through the tree. This propagation significantly contributes to the runtime complexity not just of LJT but of any tree-based inference algorithm. We present a lifted propagation scheme based on the so-called Hugin scheme whose runtime complexity is independent of the degree of the tree. Thereby, lifted Hugin can achieve asymptotic speed improvements over the existing lifted Shafer-Shenoy propagation. An empirical evaluation confirms these results. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/hoffmann22a.html
  PDF: https://proceedings.mlr.press/v151/hoffmann22a/hoffmann22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-hoffmann22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Moritz P.
    family: Hoffmann
  - given: Tanya
    family: Braun
  - given: Ralf
    family: Möller
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6501-6510
  id: hoffmann22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6501
  lastpage: 6510
  published: 2022-05-03 00:00:00 +0000
- title: ' Proximal Optimal Transport Modeling of Population Dynamics '
  abstract: ' We propose a new approach to model the collective dynamics of a population of particles evolving with time. As is often the case in challenging scientific applications, notably single-cell genomics, measuring features for these particles requires destroying them. As a result, the population can only be monitored with periodic snapshots, obtained by sampling a few particles that are sacrificed in exchange for measurements. Given only access to these snapshots, can we reconstruct likely individual trajectories for all other particles? We propose to model these trajectories as collective realizations of a causal Jordan-Kinderlehrer-Otto (JKO) flow of measures: The JKO scheme posits that the new configuration taken by a population at time t+1 is one that trades off an improvement, in the sense that it decreases an energy, while remaining close (in Wasserstein distance) to the previous configuration observed at t. In order to learn such an energy using only snapshots, we propose JKOnet, a neural architecture that computes (in end-to-end differentiable fashion) the JKO flow given a parametric energy and initial configuration of points. We demonstrate the good performance and robustness of the JKOnet fitting procedure, compared to a more direct forward method. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/bunne22a.html
  PDF: https://proceedings.mlr.press/v151/bunne22a/bunne22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-bunne22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Charlotte
    family: Bunne
  - given: Laetitia
    family: Papaxanthos
  - given: Andreas
    family: Krause
  - given: Marco
    family: Cuturi
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6511-6528
  id: bunne22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6511
  lastpage: 6528
  published: 2022-05-03 00:00:00 +0000
- title: ' Asynchronous Upper Confidence Bound Algorithms for Federated Linear Bandits '
  abstract: ' Linear contextual bandit is a popular online learning problem. It has been mostly studied in centralized learning settings. With the surging demand of large-scale decentralized model learning, e.g., federated learning, how to retain regret minimization while reducing communication cost becomes an open challenge. In this paper, we study linear contextual bandit in a federated learning setting. We propose a general framework with asynchronous model update and communication for a collection of homogeneous clients and heterogeneous clients, respectively. Rigorous theoretical analysis is provided about the regret and communication cost under this distributed learning framework; and extensive empirical evaluations demonstrate the effectiveness of our solution. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/li22e.html
  PDF: https://proceedings.mlr.press/v151/li22e/li22e.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-li22e.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Chuanhao
    family: Li
  - given: Hongning
    family: Wang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6529-6553
  id: li22e
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6529
  lastpage: 6553
  published: 2022-05-03 00:00:00 +0000
- title: ' Efficient Hyperparameter Tuning for Large Scale Kernel Ridge Regression '
  abstract: ' Kernel methods provide a principled approach to nonparametric learning. While their basic implementations scale poorly to large problems, recent advances showed that approximate solvers can efficiently handle massive datasets. A shortcoming of these solutions is that hyperparameter tuning is not taken care of, and left for the user to perform. Hyperparameters are crucial in practice and the lack of automated tuning greatly hinders efficiency and usability. In this paper, we work to fill in this gap focusing on kernel ridge regression based on the Nyström approximation. After reviewing and contrasting a number of hyperparameter tuning strategies, we propose a complexity regularization criterion based on a data dependent penalty, and discuss its efficient optimization. Then, we proceed to a careful and extensive empirical evaluation highlighting strengths and weaknesses of the different tuning strategies. Our analysis shows the benefit of the proposed approach, that we hence incorporate in a library for large scale kernel methods to derive adaptively tuned solutions. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/meanti22a.html
  PDF: https://proceedings.mlr.press/v151/meanti22a/meanti22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-meanti22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Giacomo
    family: Meanti
  - given: Luigi
    family: Carratino
  - given: Ernesto
    family: De Vito
  - given: Lorenzo
    family: Rosasco
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6554-6572
  id: meanti22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6554
  lastpage: 6572
  published: 2022-05-03 00:00:00 +0000
- title: ' Are All Linear Regions Created Equal? '
  abstract: ' The number of linear regions has been studied as a proxy of complexity for ReLU networks. However, the empirical success of network compression techniques like pruning and knowledge distillation, suggest that in the overparameterized setting, linear regions density might fail to capture the effective nonlinearity. In this work, we propose an efficient algorithm for discovering linear regions and use it to investigate the effectiveness of density in capturing the nonlinearity of trained VGGs and ResNets on CIFAR-10 and CIFAR-100. We contrast the results with a more principled nonlinearity measure based on function variation, highlighting the shortcomings of linear regions density. Furthermore, interestingly, our measure of nonlinearity clearly correlates with model-wise deep double descent, connecting reduced test error with reduced nonlinearity, and increased local similarity of linear regions. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/gamba22a.html
  PDF: https://proceedings.mlr.press/v151/gamba22a/gamba22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-gamba22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Matteo
    family: Gamba
  - given: Adrian
    family: Chmielewski-Anders
  - given: Josephine
    family: Sullivan
  - given: Hossein
    family: Azizpour
  - given: Marten
    family: Bjorkman
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6573-6590
  id: gamba22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6573
  lastpage: 6590
  published: 2022-05-03 00:00:00 +0000
- title: ' Entrywise Recovery Guarantees for Sparse PCA via Sparsistent Algorithms '
  abstract: ' Sparse Principal Component Analysis (PCA) is a prevalent tool across a plethora of subfield of applied statistics. While several results have characterized the recovery error of the principal eigenvectors, these are typically in spectral or Frobenius norms. In this paper, we provide entrywise $\ell_{2,\infty}$ bounds for Sparse PCA under a general high-dimensional subgaussian design. In particular, our bounds hold for any algorithm that selects the correct support with high probability, those that are sparsistent. Our bound improves upon known results by providing a finer characterization of the estimation error, and our proof uses techniques recently developed for entrywise subspace perturbation theory. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/agterberg22a.html
  PDF: https://proceedings.mlr.press/v151/agterberg22a/agterberg22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-agterberg22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Joshua
    family: Agterberg
  - given: Jeremias
    family: Sulam
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6591-6629
  id: agterberg22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6591
  lastpage: 6629
  published: 2022-05-03 00:00:00 +0000
- title: ' An Alternate Policy Gradient Estimator for Softmax Policies '
  abstract: ' Policy gradient (PG) estimators are ineffective in dealing with softmax policies that are sub-optimally saturated, which refers to the situation when the policy concentrates its probability mass on sub-optimal actions. Sub-optimal policy saturation may arise from bad policy initialization or sudden changes in the environment that occur after the policy has already converged. Current softmax PG estimators require a large number of updates to overcome policy saturation, which causes low sample efficiency and poor adaptability to new situations. To mitigate this problem, we propose a novel PG estimator for softmax policies that utilizes the bias in the critic estimate and the noise present in the reward signal to escape the saturated regions of the policy parameter space. Our theoretical analysis and experiments, conducted on bandits and various reinforcement learning environments, show that this new estimator is significantly more robust to policy saturation. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/garg22b.html
  PDF: https://proceedings.mlr.press/v151/garg22b/garg22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-garg22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Shivam
    family: Garg
  - given: Samuele
    family: Tosatto
  - given: Yangchen
    family: Pan
  - given: Martha
    family: White
  - given: Rupam
    family: Mahmood
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6630-6689
  id: garg22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6630
  lastpage: 6689
  published: 2022-05-03 00:00:00 +0000
- title: ' Co-Regularized Adversarial Learning for Multi-Domain Text Classification '
  abstract: ' Multi-domain text classification (MDTC) aims to leverage all available resources from multiple domains to learn a predictive model that can generalize well on these domains. Recently, many MDTC methods adopt adversarial learning, shared-private paradigm, and entropy minimization to yield state-of-the-art results. However, these approaches face three issues: (1) Minimizing domain divergence can not fully guarantee the success of domain alignment; (2) Aligning marginal feature distributions can not fully guarantee the discriminability of the learned features; (3) Standard entropy minimization may make the predictions on unlabeled data over-confident, deteriorating the discriminability of the learned features. In order to address the above issues, we propose a co-regularized adversarial learning (CRAL) mechanism for MDTC. This approach constructs two diverse shared latent spaces, performs domain alignment in each of them, and punishes the disagreements of these two alignments with respect to the predictions on unlabeled data. Moreover, virtual adversarial training (VAT) with entropy minimization is incorporated to impose consistency regularization to the CRAL method. Experiments show that our model outperforms state-of-the-art methods on two MDTC benchmarks. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/wu22d.html
  PDF: https://proceedings.mlr.press/v151/wu22d/wu22d.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-wu22d.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yuan
    family: Wu
  - given: Diana
    family: Inkpen
  - given: Ahmed
    family: El-Roby
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6690-6701
  id: wu22d
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6690
  lastpage: 6701
  published: 2022-05-03 00:00:00 +0000
- title: ' Zeroth-Order Methods for Convex-Concave Min-max Problems: Applications to Decision-Dependent Risk Minimization '
  abstract: ' Min-max optimization is emerging as a key framework for analyzing problems of robustness to strategically and adversarially generated data. We propose the random reshuffling-based gradient-free Optimistic Gradient Descent-Ascent algorithm for solving convex-concave min-max problems with finite sum structure. We prove that the algorithm enjoys the same convergence rate as that of zeroth-order algorithms for convex minimization problems. We deploy the algorithm to solve the distributionally robust strategic classification problem, where gradient information is not readily available, by reformulating the latter into a finite dimensional convex concave min-max problem. Through illustrative simulations, we observe that our proposed approach learns models that are simultaneously robust against adversarial distribution shifts and strategic decisions from the data sources, and outperforms existing methods from the strategic classification literature. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/maheshwari22a.html
  PDF: https://proceedings.mlr.press/v151/maheshwari22a/maheshwari22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-maheshwari22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Chinmay
    family: Maheshwari
  - given: Chih-Yuan
    family: Chiu
  - given: Eric
    family: Mazumdar
  - given: Shankar
    family: Sastry
  - given: Lillian
    family: Ratliff
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6702-6734
  id: maheshwari22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6702
  lastpage: 6734
  published: 2022-05-03 00:00:00 +0000
- title: ' Near Instance Optimal Model Selection for Pure Exploration Linear Bandits '
  abstract: ' The model selection problem in the pure exploration linear bandit setting is introduced and studied in both the fixed confidence and fixed budget settings. The model selection problem considers a nested sequence of hypothesis classes of increasing complexities. Our goal is to automatically adapt to the instance-dependent complexity measure of the smallest hypothesis class containing the true model, rather than suffering from the complexity measure related to the largest hypothesis class. We provide evidence showing that a standard doubling trick over dimension fails to achieve the optimal instance-dependent sample complexity. Our algorithms define a new optimization problem based on experimental design that leverages the geometry of the action set to efficiently identify a near-optimal hypothesis class. Our fixed budget algorithm uses a novel application of a selection-validation trick in bandits. This provides a new method for the understudied fixed budget setting in linear bandits (even without the added challenge of model selection). We further generalize the model selection problem to the misspecified regime, adapting our algorithms in both fixed confidence and fixed budget settings. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/zhu22f.html
  PDF: https://proceedings.mlr.press/v151/zhu22f/zhu22f.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-zhu22f.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yinglun
    family: Zhu
  - given: Julian
    family: Katz-Samuels
  - given: Robert
    family: Nowak
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6735-6769
  id: zhu22f
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6735
  lastpage: 6769
  published: 2022-05-03 00:00:00 +0000
- title: ' Identification in Tree-shaped Linear Structural Causal Models '
  abstract: ' Linear structural equation models represent direct causal effects as directed edges and confounding factors as bidirected edges. An open problem is to identify the causal parameters from correlations between the nodes. We investigate models, whose directed component forms a tree, and show that there, besides classical instrumental variables, missing cycles of bidirected edges can be used to identify the model. They can yield systems of quadratic equations that we explicitly solve to obtain one or two solutions for the causal parameters of adjacent directed edges. We show how multiple missing cycles can be combined to obtain a unique solution. This results in an algorithm that can identify instances that previously required approaches based on Gröbner bases, which have doubly-exponential time complexity in the number of structural parameters. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/van-der-zander22a.html
  PDF: https://proceedings.mlr.press/v151/van-der-zander22a/van-der-zander22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-van-der-zander22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Benito
    family: Van Der Zander
  - given: Marcel
    family: Wienöbst
  - given: Markus
    family: Bläser
  - given: Maciej
    family: Liskiewicz
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6770-6792
  id: van-der-zander22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6770
  lastpage: 6792
  published: 2022-05-03 00:00:00 +0000
- title: ' Pareto Optimal Model Selection in Linear Bandits '
  abstract: ' We study model selection in linear bandits, where the learner must adapt to the dimension (denoted by $d_\star$) of the smallest hypothesis class containing the true linear model while balancing exploration and exploitation. Previous papers provide various guarantees for this model selection problem, but have limitations; i.e., the analysis requires favorable conditions that allow for inexpensive statistical testing to locate the right hypothesis class or are based on the idea of “corralling” multiple base algorithms, which often performs relatively poorly in practice. These works also mainly focus on upper bounds. In this paper, we establish the first lower bound for the model selection problem. Our lower bound implies that, even with a fixed action set, adaptation to the unknown dimension $d_\star$ comes at a cost: There is no algorithm that can achieve the regret bound $\widetilde{O}(\sqrt{d_\star T})$ simultaneously for all values of $d_\star$. We propose Pareto optimal algorithms that match the lower bound. Empirical evaluations show that our algorithm enjoys superior performance compared to existing ones. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/zhu22g.html
  PDF: https://proceedings.mlr.press/v151/zhu22g/zhu22g.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-zhu22g.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yinglun
    family: Zhu
  - given: Robert
    family: Nowak
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6793-6813
  id: zhu22g
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6793
  lastpage: 6813
  published: 2022-05-03 00:00:00 +0000
- title: ' PAC Top-$k$ Identification under SST in Limited Rounds '
  abstract: ' We consider the problem of finding top-$k$ items from a set of $n$ items using actively chosen pairwise comparisons. This problem has been widely studied in machine learning and has widespread applications in recommendation systems, sports, social choice etc. Motivated by applications where there can be a substantial delay between requesting comparisons and receiving feedback, we consider an active/adaptive learning setting where the algorithm uses limited rounds of parallel interaction with the feedback generating oracle. We study this problem under the strong stochastic transitivity (SST) noise model which is a widely studied ranking model and captures many applications. A special case of this model is the noisy comparison model for which it was recently shown that $O(n \log k)$ comparisons and $\log^* n$ rounds of adaptivity are sufficient to find the set of top-$k$ items (Cohen-Addad et al., 2020; Braverman et al., 2019). Under the more general SST model, it is known that $O(n)$ comparisons and $O(n)$ rounds are sufficient to find a PAC top-1 item (Falahatgar et al., 2017a,b), however, not much seems to be known for general $k$, even given unbounded rounds of adaptivity. We first show that $\Omega (nk)$ comparisons are necessary for PAC top-$k$ identification under SST even with unbounded adaptivity, establishing that this problem is strictly harder under SST than it is for the noisy comparison model. Our main contribution is to show that the 2-round query complexity for this problem is $\widetilde{\Theta} (n^{4/3} + nk)$, and to show that just 3 rounds are sufficient to obtain a nearly optimal query complexity of $\widetilde{\Theta}(nk)$. We further show that our 3-round result can be improved by a $\log (n)$ factor using $2 \log^* n + 4$ rounds. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/agarwal22a.html
  PDF: https://proceedings.mlr.press/v151/agarwal22a/agarwal22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-agarwal22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Arpit
    family: Agarwal
  - given: Sanjeev
    family: Khanna
  - given: Prathamesh
    family: Patil
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6814-6839
  id: agarwal22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6814
  lastpage: 6839
  published: 2022-05-03 00:00:00 +0000
- title: ' Local SGD Optimizes Overparameterized Neural Networks in Polynomial Time '
  abstract: ' In this paper we prove that Local (S)GD (or FedAvg) can optimize deep neural networks with Rectified Linear Unit (ReLU) activation function in polynomial time. Despite the established convergence theory of Local SGD on optimizing general smooth functions in communication-efficient distributed optimization, its convergence on non-smooth ReLU networks still eludes full theoretical understanding. The key property used in many Local SGD analysis on smooth function is gradient Lipschitzness, so that the gradient on local models will not drift far away from that on averaged model. However, this decent property does not hold in networks with non-smooth ReLU activation function. We show that, even though ReLU network does not admit gradient Lipschitzness property, the difference between gradients on local models and average model will not change too much, under the dynamics of Local SGD. We validate our theoretical results via extensive experiments. This work is the first to show the convergence of Local SGD on non-smooth functions, and will shed lights on the optimization theory of federated training of deep neural networks. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/deng22a.html
  PDF: https://proceedings.mlr.press/v151/deng22a/deng22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-deng22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yuyang
    family: Deng
  - given: Mohammad
    family: Mahdi Kamani
  - given: Mehrdad
    family: Mahdavi
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6840-6861
  id: deng22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6840
  lastpage: 6861
  published: 2022-05-03 00:00:00 +0000
- title: ' Dual-Level Adaptive Information Filtering for Interactive Image Segmentation '
  abstract: ' Image segmentation can be performed interactively by accepting user annotations to refine the segmentation. It seeks frequent feedback from humans, and the model is updated with a smaller batch of data in each iteration of the feedback loop. Such a training paradigm requires effective information filtering to guide the model so that it can encode vital information and avoid overfitting due to limited data and inherent heterogeneity and noises thereof. We propose an adaptive interactive segmentation framework to support user interaction while introducing dual-level information filtering to train a robust model. The framework integrates an encoder-decoder architecture with a style-aware augmentation module that applies augmentation to feature maps and customizes the segmentation prediction for different latent styles. It also applies a systematic label softening strategy to generate uncertainty-aware soft labels for model updates. Experiments on both medical and natural image segmentation tasks demonstrate the effectiveness of the proposed framework. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/zheng22b.html
  PDF: https://proceedings.mlr.press/v151/zheng22b/zheng22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-zheng22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ervine
    family: Zheng
  - given: Qi
    family: Yu
  - given: Rui
    family: Li
  - given: Pengcheng
    family: Shi
  - given: Anne
    family: Haake
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6862-6879
  id: zheng22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6862
  lastpage: 6879
  published: 2022-05-03 00:00:00 +0000
- title: ' On the Value of Prior in Online Learning to Rank '
  abstract: ' This paper addresses the cold-start problem in online learning to rank (OLTR). We show both theoretically and empirically that priors improve the quality of ranked lists presented to users interactively based on user feedback. These priors can come in the form of unbiased estimates of the relevance of the ranked items, or more practically, can be obtained from offline-learned models. Our experiments show the effectiveness of priors in improving the short-term regret of tabular OLTR algorithms, based on Thompson sampling and BayesUCB. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/kveton22a.html
  PDF: https://proceedings.mlr.press/v151/kveton22a/kveton22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-kveton22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Branislav
    family: Kveton
  - given: Ofer
    family: Meshi
  - given: Masrour
    family: Zoghi
  - given: Zhen
    family: Qin
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6880-6892
  id: kveton22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6880
  lastpage: 6892
  published: 2022-05-03 00:00:00 +0000
- title: ' Doubly Mixed-Effects Gaussian Process Regression '
  abstract: ' We address the multi-task Gaussian process (GP) regression problem with the goal of decomposing input effects on outputs into components shared across or specific to tasks and samples. We propose a family of mixed-effects GPs, including doubly and translated mixed-effects GPs, that performs such a decomposition, while also modeling the complex task relationships. Instead of the tensor product widely used in multi-task GPs, we use the direct sum and Kronecker sum for Cartesian product to combine task and sample covariance functions. With this kernel, the overall input effects on outputs decompose into four components: fixed effects shared across tasks and across samples and random effects specific to each task and to each sample. We describe an efficient stochastic variational inference method for our proposed models that also significantly reduces the cost of inference for the existing mixed-effects GPs. On simulated and real-world data, we demonstrate that our approach provides higher test accuracy and interpretable decomposition. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ho-yoon22a.html
  PDF: https://proceedings.mlr.press/v151/ho-yoon22a/ho-yoon22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ho-yoon22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jun
    family: Ho Yoon
  - given: Daniel P.
    family: Jeong
  - given: Seyoung
    family: Kim
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6893-6908
  id: ho-yoon22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6893
  lastpage: 6908
  published: 2022-05-03 00:00:00 +0000
- title: ' Weighted Gaussian Process Bandits for Non-stationary Environments '
  abstract: ' In this paper, we consider the Gaussian process (GP) bandit optimization problem in a non-stationary environment. To capture external changes, the black-box function is allowed to be time-varying within a reproducing kernel Hilbert space (RKHS). To this end, we develop WGP-UCB, a novel UCB-type algorithm based on weighted Gaussian process regression. A key challenge is how to cope with infinite-dimensional feature maps. To that end, we leverage kernel approximation techniques to prove a sublinear regret bound, which is the first (frequentist) sublinear regret guarantee on weighted time-varying bandits with general nonlinear rewards. This result generalizes both non-stationary linear bandits and standard GP-UCB algorithms. Further, a novel concentration inequality is achieved for weighted Gaussian process regression with general weights. We also provide universal upper bounds and weight-dependent upper bounds for weighted maximum information gains. These results are of independent interest for applications such as news ranking and adaptive pricing, where weights can be adopted to capture the importance or quality of data. Finally, we conduct experiments to highlight the favorable gains of the proposed algorithm in many cases when compared to existing methods. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/deng22b.html
  PDF: https://proceedings.mlr.press/v151/deng22b/deng22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-deng22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yuntian
    family: Deng
  - given: Xingyu
    family: Zhou
  - given: Baekjin
    family: Kim
  - given: Ambuj
    family: Tewari
  - given: Abhishek
    family: Gupta
  - given: Ness
    family: Shroff
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6909-6932
  id: deng22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6909
  lastpage: 6932
  published: 2022-05-03 00:00:00 +0000
- title: ' Conditionally Tractable Density Estimation using Neural Networks '
  abstract: ' Tractable models such as cutset networks and sum-product networks (SPNs) have become increasingly popular because they have superior predictive performance. Among them, cutset networks, which model the mechanics of Pearl’s cutset conditioning algorithm, demonstrate great scalability and prediction accuracy. Existing research on cutset networks has mainly focused on discrete domains, and the best mechanism to extend cutset networks to continuous domains is unclear. We propose one possible alternative to cutset networks that models the full joint distribution as the product of a local, complex distribution over a small subset of variables and a fully tractable conditional distribution whose parameters are controlled using a neural network. This model admits exact inference when all variables in the local distribution are observed, and although the model is not fully tractable in general, we show that “cutset” sampling can be employed to efficiently generate accurate predictions in practice. We show that our model performs comparably or better than existing competitors through a variety of prediction tasks on real datasets. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/dong22a.html
  PDF: https://proceedings.mlr.press/v151/dong22a/dong22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-dong22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Hailiang
    family: Dong
  - given: Chiradeep
    family: Roy
  - given: Tahrima
    family: Rahman
  - given: Vibhav
    family: Gogate
  - given: Nicholas
    family: Ruozzi
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6933-6946
  id: dong22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6933
  lastpage: 6946
  published: 2022-05-03 00:00:00 +0000
- title: ' Common Information based Approximate State Representations in Multi-Agent Reinforcement Learning '
  abstract: ' Due to information asymmetry, finding optimal policies for Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) is hard with the complexity growing doubly exponentially in the horizon length. The challenge increases greatly in the multi-agent reinforcement learning (MARL) setting where the transition probabilities, observation kernel, and reward function are unknown. Here, we develop a general compression framework with approximate common and private state representations, based on which decentralized policies can be constructed. We derive the optimality gap of executing dynamic programming (DP) with the approximate states in terms of the approximation error parameters and the remaining time steps. When the compression is exact (no error), the resulting DP is equivalent to the one in existing work. Our general framework generalizes a number of methods proposed in the literature. The results shed light on designing practically useful deep-MARL network structures under the "centralized learning distributed execution" scheme. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/kao22a.html
  PDF: https://proceedings.mlr.press/v151/kao22a/kao22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-kao22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Hsu
    family: Kao
  - given: Vijay
    family: Subramanian
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6947-6967
  id: kao22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6947
  lastpage: 6967
  published: 2022-05-03 00:00:00 +0000
- title: ' Sensing Cox Processes via Posterior Sampling and Positive Bases '
  abstract: ' We study adaptive sensing of Cox point processes, a widely used model from spatial statistics. We introduce three tasks: maximization of captured events, search for the maximum of the intensity function and learning level sets of the intensity function. We model the intensity function as a sample from a truncated Gaussian process, represented in a specially constructed positive basis. In this basis, the positivity constraint on the intensity function has a simple form. We show how the <em>minimal description positive basis</em> can be adapted to the covariance kernel, to non-stationarity and make connections to common positive bases from prior works. Our adaptive sensing algorithms use Langevin dynamics and are based on posterior sampling (<em>Cox-Thompson</em>) and top-two posterior sampling (<em>Top2</em>) principles. With latter, the difference between samples serves as a surrogate to the uncertainty. We demonstrate the approach using examples from environmental monitoring and crime rate modeling, and compare it to the classical Bayesian experimental design approach. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/mutny22a.html
  PDF: https://proceedings.mlr.press/v151/mutny22a/mutny22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-mutny22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Mojmir
    family: Mutny
  - given: Andreas
    family: Krause
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6968-6989
  id: mutny22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6968
  lastpage: 6989
  published: 2022-05-03 00:00:00 +0000
- title: ' A Predictive Approach to Bayesian Nonparametric Survival Analysis '
  abstract: ' Bayesian nonparametric methods are a popular choice for analysing survival data due to their ability to flexibly model the distribution of survival times. These methods typically employ a nonparametric prior on the survival function that is conjugate with respect to right-censored data. Eliciting these priors, particularly in the presence of covariates, can be challenging and inference typically relies on computationally intensive Markov chain Monte Carlo schemes. In this paper, we build on recent work that recasts Bayesian inference as assigning a predictive distribution on the unseen values of a population conditional on the observed samples, thus avoiding the need to specify a complex prior. We describe a copula-based predictive update which admits a scalable sequential importance sampling algorithm to perform inference that properly accounts for right-censoring. We provide theoretical justification through an extension of Doob’s consistency theorem and illustrate the method on a number of simulated and real data sets, including an example with covariates. Our approach enables analysts to perform Bayesian nonparametric inference through only the specification of a predictive distribution. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/fong22a.html
  PDF: https://proceedings.mlr.press/v151/fong22a/fong22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-fong22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Edwin
    family: Fong
  - given: Brieuc
    family: Lehmann
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 6990-7013
  id: fong22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 6990
  lastpage: 7013
  published: 2022-05-03 00:00:00 +0000
- title: ' On the equivalence of Oja’s algorithm and GROUSE '
  abstract: ' The analysis of streaming PCA has gained significant traction through the analysis of an early simple variant: Oja’s algorithm, which implements online projected gradient descent for the trace objective. Several other streaming PCA algorithms have been developed, each with their own performance guarantees or empirical studies, and the question arises whether there is a relationship between the algorithms. We show that the Grassmannian Rank-One Subspace Estimation (GROUSE) algorithm is indeed equivalent to Oja’s algorithm in the sense that, at each iteration, given a step size for one of the algorithms, we may construct a step size for the other algorithm that results in an identical update. This allows us to apply all results on one algorithm to the other. In particular, we have (1) better global convergence guarantees of GROUSE to the global minimizer of the PCA objective with full data; and (2) local convergence guarantees for Oja’s algorithm with incomplete or compressed data. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/balzano22a.html
  PDF: https://proceedings.mlr.press/v151/balzano22a/balzano22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-balzano22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Laura
    family: Balzano
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7014-7030
  id: balzano22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7014
  lastpage: 7030
  published: 2022-05-03 00:00:00 +0000
- title: ' Diversified Sampling for Batched Bayesian Optimization with Determinantal Point Processes '
  abstract: ' In Bayesian Optimization (BO) we study black-box function optimization with noisy point evaluations and Bayesian priors. Convergence of BO can be greatly sped up by batching, where multiple evaluations of the black-box function are performed in a single round. The main difficulty in this setting is to propose at the same time diverse and informative batches of evaluation points. In this work, we introduce DPP-Batch Bayesian Optimization (DPP-BBO), a universal framework for inducing batch diversity in sampling based BO by leveraging the repulsive properties of Determinantal Point Processes (DPP) to naturally diversify the batch sampling procedure. We illustrate this framework by formulating DPP-Thompson Sampling (DPP-TS) as a variant of the popular Thompson Sampling (TS) algorithm and introducing a Markov Chain Monte Carlo procedure to sample from it. We then prove novel Bayesian simple regret bounds for both classical batched TS as well as our counterpart DPP-TS; with the latter bound being tighter. Our real-world, as well as synthetic, experiments demonstrate improved performance of DPP-BBO over classical batching methods with Gaussian process and Cox process models. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/nava22a.html
  PDF: https://proceedings.mlr.press/v151/nava22a/nava22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-nava22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Elvis
    family: Nava
  - given: Mojmir
    family: Mutny
  - given: Andreas
    family: Krause
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7031-7054
  id: nava22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7031
  lastpage: 7054
  published: 2022-05-03 00:00:00 +0000
- title: ' Label differential privacy via clustering '
  abstract: ' We present new mechanisms for label differential privacy, a relaxation of differentially private machine learning that only protects the privacy of the labels in the training set. Our mechanisms cluster the examples in the training set using their (non-private) feature vectors, randomly re-sample each label from examples in the same cluster, and output a training set with noisy labels as well as a modified version of the true loss function. We prove that when the clusters are both large and high-quality, the model that minimizes the modified loss on the noisy training set converges to small excess risk at a rate that is comparable to the rate for non-private learning. We also describe a learning problem in which large clusters are necessary to achieve both strong privacy and either good precision or good recall. Our experiments show that randomizing the labels within each cluster significantly improves the privacy vs. accuracy trade-off compared to applying uniform randomized response to the labels, and also compared to learning a model via DP-SGD. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/esfandiari22a.html
  PDF: https://proceedings.mlr.press/v151/esfandiari22a/esfandiari22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-esfandiari22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Hossein
    family: Esfandiari
  - given: Vahab
    family: Mirrokni
  - given: Umar
    family: Syed
  - given: Sergei
    family: Vassilvitskii
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7055-7075
  id: esfandiari22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7055
  lastpage: 7075
  published: 2022-05-03 00:00:00 +0000
- title: ' Neural score matching for high-dimensional causal inference '
  abstract: ' Traditional methods for matching in causal inference are impractical for high-dimensional datasets. They suffer from the curse of dimensionality: exact matching and coarsened exact matching find exponentially fewer matches as the input dimension grows, and propensity score matching may match highly unrelated units together. To overcome this problem, we develop theoretical results which motivate the use of neural networks to obtain non-trivial, multivariate balancing scores of a chosen level of coarseness, in contrast to the classical, scalar propensity score. We leverage these balancing scores to perform matching for high-dimensional causal inference and call this procedure neural score matching. We show that our method is competitive against other matching approaches on semi-synthetic high-dimensional datasets, both in terms of treatment effect estimation and reducing imbalance. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/clivio22a.html
  PDF: https://proceedings.mlr.press/v151/clivio22a/clivio22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-clivio22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Oscar
    family: Clivio
  - given: Fabian
    family: Falck
  - given: Brieuc
    family: Lehmann
  - given: George
    family: Deligiannidis
  - given: Chris
    family: Holmes
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7076-7110
  id: clivio22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7076
  lastpage: 7110
  published: 2022-05-03 00:00:00 +0000
- title: ' Robust Stochastic Linear Contextual Bandits Under Adversarial Attacks '
  abstract: ' Stochastic linear contextual bandit algorithms have substantial applications in practice, such as recommender systems, online advertising, clinical trials, etc. Recent works show that optimal bandit algorithms are vulnerable to adversarial attacks and can fail completely in the presence of attacks. Existing robust bandit algorithms only work for the non-contextual setting under the attack of rewards and cannot improve the robustness in the general and popular contextual bandit environment. In addition, none of the existing methods can defend against attacked context. In this work, we provide the first robust bandit algorithm for stochastic linear contextual bandit setting under a fully adaptive and omniscient attack with sub-linear regret. Our algorithm not only works under the attack of rewards, but also under attacked context. Moreover, it does not need any information about the attack budget or the particular form of the attack. We provide theoretical guarantees for our proposed algorithm and show by experiments that our proposed algorithm improves the robustness against various kinds of popular attacks. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ding22c.html
  PDF: https://proceedings.mlr.press/v151/ding22c/ding22c.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ding22c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Qin
    family: Ding
  - given: Cho-Jui
    family: Hsieh
  - given: James
    family: Sharpnack
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7111-7123
  id: ding22c
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7111
  lastpage: 7123
  published: 2022-05-03 00:00:00 +0000
- title: ' Multi-class classification in nonparametric active learning '
  abstract: ' Several works have recently focused on nonparametric active learning, especially in the binary classification setting under Hölder smoothness assumptions on the regression function. These works have highlighted the benefit of active learning by providing better rates of convergence compared to the passive counterpart. In this paper, we extend these results to multiclass classification under a more general smoothness assumption, which takes into account a broader class of underlying distributions. We present a new algorithm called MKAL for multiclass K-nearest neighbors active learning, and prove its theoretical benefits. Additionally, we empirically study MKAL on several datasets and discuss its merits and potential improvements. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ndjia-njike22a.html
  PDF: https://proceedings.mlr.press/v151/ndjia-njike22a/ndjia-njike22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ndjia-njike22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Boris
    family: Ndjia Njike
  - given: Xavier
    family: Siebert
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7124-7162
  id: ndjia-njike22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7124
  lastpage: 7162
  published: 2022-05-03 00:00:00 +0000
- title: ' On the Assumptions of Synthetic Control Methods '
  abstract: ' Synthetic control (SC) methods have been widely applied to estimate the causal effect of large-scale interventions, e.g., the state-wide effect of a change in policy. The idea of synthetic controls is to approximate one unit’s counterfactual outcomes using a weighted combination of some other units’ observed outcomes. The motivating question of this paper is: how does the SC strategy lead to valid causal inferences? We address this question by re-formulating the causal inference problem targeted by SC with a more fine-grained model, where we change the unit of analysis from “large units" (e.g., states) to “small units" (e.g., individuals in states). Under the re-formulation, we derive sufficient conditions for the non-parametric causal identification of the causal effect. We show that, in some settings, existing linear SC estimators are valid even when the data generating process is non-linear. We highlight two implications of the reformulation: 1) it clarifies where “linearity" comes from, and how it falls naturally out of the more fine-grained and flexible model; 2) it suggests new ways of using available data with SC methods for valid causal inference, in particular, new ways of selecting observations from which to estimate the counterfactual. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/shi22b.html
  PDF: https://proceedings.mlr.press/v151/shi22b/shi22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-shi22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Claudia
    family: Shi
  - given: Dhanya
    family: Sridhar
  - given: Vishal
    family: Misra
  - given: David
    family: Blei
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7163-7175
  id: shi22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7163
  lastpage: 7175
  published: 2022-05-03 00:00:00 +0000
- title: ' Faster Rates, Adaptive Algorithms, and Finite-Time Bounds for Linear Composition Optimization and Gradient TD Learning '
  abstract: ' Gradient temporal difference (GTD) algorithms are provably convergent policy evaluation methods for off-policy reinforcement learning. Despite much progress, proper tuning of the stochastic approximation methods used to solve the resulting saddle point optimization problem requires the knowledge of several (unknown) problem-dependent parameters. In this paper we apply adaptive step-size tuning strategies to greatly reduce this dependence on prior knowledge, and provide algorithms with adaptive convergence guarantees. In addition, we use the underlying refined analysis technique to obtain new O(1/T) rates that do not depend on the strong-convexity parameter of the problem, and also apply to the Markov noise setting, as well as the unbounded i.i.d. noise setting. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/raj22a.html
  PDF: https://proceedings.mlr.press/v151/raj22a/raj22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-raj22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Anant
    family: Raj
  - given: Pooria
    family: Joulani
  - given: Andras
    family: Gyorgy
  - given: Csaba
    family: Szepesvari
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7176-7186
  id: raj22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7176
  lastpage: 7186
  published: 2022-05-03 00:00:00 +0000
- title: ' Investigating the Role of Negatives in Contrastive Representation Learning '
  abstract: ' Noise contrastive learning is a popular technique for unsupervised representation learning. In this approach, a representation is obtained via reduction to supervised learning, where given a notion of semantic similarity, the learner tries to distinguish a similar (positive) example from a collection of random (negative) examples. The success of modern contrastive learning pipelines relies on many design decisions, such as the choice of data augmentation, the number of negative examples, and the batch size; however, there is limited understanding as to how these parameters interact and affect downstream performance. We focus on disambiguating the role of one of these parameters: the number of negative examples. Theoretically, we show the existence of a collision-coverage trade-off suggesting that the optimal number of negative examples should scale with the number of underlying concepts in the data. Empirically, we scrutinize the role of the number of negatives in both NLP and vision tasks. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ash22a.html
  PDF: https://proceedings.mlr.press/v151/ash22a/ash22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ash22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jordan
    family: Ash
  - given: Surbhi
    family: Goel
  - given: Akshay
    family: Krishnamurthy
  - given: Dipendra
    family: Misra
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7187-7209
  id: ash22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7187
  lastpage: 7209
  published: 2022-05-03 00:00:00 +0000
- title: ' Minimax Kernel Machine Learning for a Class of Doubly Robust Functionals with Application to Proximal Causal Inference '
  abstract: ' Robins et al. (2008) introduced a class of influence functions (IFs) which could be used to obtain doubly robust moment functions for the corresponding parameters. However, that class does not include the IF of parameters for which the nuisance functions are solutions to integral equations. Such parameters are particularly important in the field of causal inference, specifically in the recently proposed proximal causal inference framework of Tchetgen Tchetgen et al. (2020), which allows for estimating the causal effect in the presence of latent confounders. In this paper, we first extend the class of Robins et al. to include doubly robust IFs in which the nuisance functions are solutions to integral equations. Then we demonstrate that the double robustness property of these IFs can be leveraged to construct estimating equations for the nuisance functions, which enables us to solve the integral equations without resorting to parametric models. We frame the estimation of the nuisance functions as a minimax optimization problem. We provide convergence rates for the nuisance functions and conditions required for asymptotic linearity of the estimator of the parameter of interest. The experiment results demonstrate that our proposed methodology leads to robust and high-performance estimators for average causal effect in the proximal causal inference framework. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ghassami22a.html
  PDF: https://proceedings.mlr.press/v151/ghassami22a/ghassami22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ghassami22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Amiremad
    family: Ghassami
  - given: Andrew
    family: Ying
  - given: Ilya
    family: Shpitser
  - given: Eric
    family: Tchetgen Tchetgen
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7210-7239
  id: ghassami22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7210
  lastpage: 7239
  published: 2022-05-03 00:00:00 +0000
- title: ' Derivative-Based Neural Modelling of Cumulative Distribution Functions for Survival Analysis '
  abstract: ' Survival models — particularly those able to account for patient comorbidities via competing risks analysis — offer valuable prognostic information to clinicians making critical decisions and represent a growing area of application for machine learning approaches. However, current methods typically involve restrictive parameterisations, discretisation of time or the modelling of only one event cause. In this paper, we highlight how general cumulative distribution functions can be naturally expressed via neural network-based ordinary differential equations and how this observation can be utilised in survival analysis. In particular, we present DeSurv, a neural derivative-based approach capable of avoiding aforementioned restrictions and flexibly modelling competing-risk survival data in continuous time. We apply DeSurv to both single-risk and competing-risk synthetic and real-world datasets and obtain results which compare favourably with current state-of-the-art models. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/danks22a.html
  PDF: https://proceedings.mlr.press/v151/danks22a/danks22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-danks22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Dominic
    family: Danks
  - given: Christopher
    family: Yau
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7240-7256
  id: danks22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7240
  lastpage: 7256
  published: 2022-05-03 00:00:00 +0000
- title: ' Cycle Consistent Probability Divergences Across Different Spaces '
  abstract: ' Discrepancy measures between probability distributions are at the core of statistical inference and machine learning. In many applications, distributions of interest are supported on different spaces, and yet a meaningful correspondence between data points is desired. Motivated to explicitly encode consistent bidirectional maps into the discrepancy measure, this work proposes a novel unbalanced Monge optimal transport formulation for matching, up to isometries, distributions on different spaces. Our formulation arises as a principled relaxation of the Gromov-Haussdroff distance between metric spaces, and employs two cycle-consistent maps that push forward each distribution onto the other. We study structural properties of the proposed discrepancy and, in particular, show that it captures the popular cycle-consistent generative adversarial network (GAN) framework as a special case, thereby providing the theory to explain it. Motivated by computational efficiency, we then kernelize the discrepancy and restrict the mappings to parametric function classes. The resulting kernelized version is coined the generalized maximum mean discrepancy (GMMD). Convergence rates for empirical estimation of GMMD are studied and experiments to support our theory are provided. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/zhang22d.html
  PDF: https://proceedings.mlr.press/v151/zhang22d/zhang22d.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-zhang22d.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Zhengxin
    family: Zhang
  - given: Youssef
    family: Mroueh
  - given: Ziv
    family: Goldfeld
  - given: Bharath
    family: Sriperumbudur
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7257-7285
  id: zhang22d
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7257
  lastpage: 7285
  published: 2022-05-03 00:00:00 +0000
- title: ' Warping Layer: Representation Learning for Label Structures in Weakly Supervised Learning '
  abstract: ' Many learning tasks only receive weak supervision, such as semi-supervised learning and few-shot learning. With limited labeled data, prior structures become especially important, and prominent examples include hierarchies and mutual exclusions in the class space. However, most existing approaches only learn the representations separately in the feature space and the label space, and do not explicitly enforce the logical relationships. In this paper, we propose a novel warping layer that jointly learns representations in both spaces, and thanks to the modularity and differentiability, it can be directly embedded into generative models to leverage the prior hierarchical structure and unlabeled data. The effectiveness of the warping layer is demonstrated on both few-shot and semi-supervised learning, outperforming the state of the art in practice. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ma22a.html
  PDF: https://proceedings.mlr.press/v151/ma22a/ma22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ma22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yingyi
    family: Ma
  - given: Xinhua
    family: Zhang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7286-7299
  id: ma22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7286
  lastpage: 7299
  published: 2022-05-03 00:00:00 +0000
- title: ' A Bandit Model for Human-Machine Decision Making with Private Information and Opacity '
  abstract: ' Applications of machine learning inform human decision makers in a broad range of tasks. The resulting problem is usually formulated in terms of a single decision maker. We argue that it should rather be described as a two-player learning problem where one player is the machine and the other the human. While both players try to optimize the final decision, the setup is often characterized by (1) the presence of private information and (2) opacity, that is imperfect understanding between the decision makers. We prove that both properties can complicate decision making considerably. A lower bound quantifies the worst-case hardness of optimally advising a decision maker who is opaque or has access to private information. An upper bound shows that a simple coordination strategy is nearly minimax optimal. More efficient learning is possible under certain assumptions on the problem, for example that both players learn to take actions independently. Such assumptions are implicit in existing literature, for example in medical applications of machine learning, but have not been described or justified theoretically. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/bordt22a.html
  PDF: https://proceedings.mlr.press/v151/bordt22a/bordt22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-bordt22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Sebastian
    family: Bordt
  - given: Ulrike
    family: Von Luxburg
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7300-7319
  id: bordt22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7300
  lastpage: 7319
  published: 2022-05-03 00:00:00 +0000
- title: ' Ada-BKB: Scalable Gaussian Process Optimization on Continuous Domains by Adaptive Discretization '
  abstract: ' Gaussian process optimization is a successful class of algorithms(e.g. GP-UCB) to optimize a black-box function through sequential evaluations. However, for functions with continuous domains, Gaussian process optimization has to rely on either a fixed discretization of the space, or the solution of a non-convex ptimization subproblem at each evaluation. The first approach can negatively affect performance, while the second approach requires a heavy computational burden. A third option, only recently theoretically studied, is to adaptively discretize the function domain. Even though this approach avoids the extra non-convex optimization costs, the overall computational complexity is still prohibitive. An algorithm such as GP-UCB has a runtime of $O(T^4)$, where $T$ is the number of iterations. In this paper, we introduce Ada-BKB (Adaptive Budgeted Kernelized Bandit), a no-regret Gaussian process optimization algorithm for functions on continuous domains, that provably runs in $O(T^2 d_\text{eff}^2)$, where $d_\text{eff}$ is the effective dimension of the explored space, and which is typically much smaller than $T$. We corroborate our theoretical findings with experiments on synthetic non-convex functions and on the real-world problem of hyper-parameter optimization, confirming the good practical performances of the proposed approach. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/rando22a.html
  PDF: https://proceedings.mlr.press/v151/rando22a/rando22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-rando22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Marco
    family: Rando
  - given: Luigi
    family: Carratino
  - given: Silvia
    family: Villa
  - given: Lorenzo
    family: Rosasco
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7320-7348
  id: rando22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7320
  lastpage: 7348
  published: 2022-05-03 00:00:00 +0000
- title: ' Adaptive Multi-Goal Exploration '
  abstract: ' We introduce a generic strategy for provably efficient multi-goal exploration. It relies on AdaGoal, a novel goal selection scheme that leverages a measure of uncertainty in reaching states to adaptively target goals that are neither too difficult nor too easy. We show how AdaGoal can be used to tackle the objective of learning an $\epsilon$-optimal goal-conditioned policy for the (initially unknown) set of goal states that are reachable within $L$ steps in expectation from a reference state $s_0$ in a reward-free Markov decision process. In the tabular case with $S$ states and $A$ actions, our algorithm requires $\tilde{O}(L^3 S A \epsilon^{-2})$ exploration steps, which is nearly minimax optimal. We also readily instantiate AdaGoal in linear mixture Markov decision processes, yielding the first goal-oriented PAC guarantee with linear function approximation. Beyond its strong theoretical guarantees, we anchor AdaGoal in goal-conditioned deep reinforcement learning, both conceptually and empirically, by connecting its idea of selecting "uncertain" goals to maximizing value ensemble disagreement. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/tarbouriech22a.html
  PDF: https://proceedings.mlr.press/v151/tarbouriech22a/tarbouriech22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-tarbouriech22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jean
    family: Tarbouriech
  - given: Omar
    family: Darwiche Domingues
  - given: Pierre
    family: Menard
  - given: Matteo
    family: Pirotta
  - given: Michal
    family: Valko
  - given: Alessandro
    family: Lazaric
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7349-7383
  id: tarbouriech22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7349
  lastpage: 7383
  published: 2022-05-03 00:00:00 +0000
- title: ' Chernoff Sampling for Active Testing and Extension to Active Regression '
  abstract: ' Active learning can reduce the number of samples needed to perform a hypothesis test and to estimate the parameters of a model. In this paper, we revisit the work of Chernoff that described an asymptotically optimal algorithm for performing a hypothesis test. We obtain a novel sample complexity bound for Chernoff’s algorithm, with a non-asymptotic term that characterizes its performance at a fixed confidence level. We also develop an extension of Chernoff sampling that can be used to estimate the parameters of a wide variety of models and we obtain a non-asymptotic bound on the estimation error. We apply our extension of Chernoff sampling to actively learn neural network models and to estimate parameters in real-data linear and non-linear regression problems, where our approach performs favorably to state-of-the-art methods. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/mukherjee22a.html
  PDF: https://proceedings.mlr.press/v151/mukherjee22a/mukherjee22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-mukherjee22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Subhojyoti
    family: Mukherjee
  - given: Ardhendu S.
    family: Tripathy
  - given: Robert
    family: Nowak
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7384-7432
  id: mukherjee22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7384
  lastpage: 7432
  published: 2022-05-03 00:00:00 +0000
- title: ' An Information-theoretical Approach to Semi-supervised Learning under Covariate-shift '
  abstract: ' A common assumption in semi-supervised learning is that the labeled, unlabeled, and test data are drawn from the same distribution. However, this assumption is not satisfied in many applications. In many scenarios, the data is collected sequentially (e.g., healthcare) and the distribution of the data may change over time often exhibiting so-called covariate shifts. In this paper, we propose an approach for semi-supervised learning algorithms that is capable of addressing this issue. Our framework also recovers some popular methods, including entropy minimization and pseudo-labeling. We provide new information-theoretical based generalization error upper bounds inspired by our novel framework. Our bounds are applicable to both general semi-supervised learning and the covariate-shift scenario. Finally, we show numerically that our method outperforms previous approaches proposed for semi-supervised learning under the covariate shift. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/aminian22a.html
  PDF: https://proceedings.mlr.press/v151/aminian22a/aminian22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-aminian22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Gholamali
    family: Aminian
  - given: Mahed
    family: Abroshan
  - given: Mohammad
    family: Mahdi Khalili
  - given: Laura
    family: Toni
  - given: Miguel
    family: Rodrigues
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7433-7449
  id: aminian22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7433
  lastpage: 7449
  published: 2022-05-03 00:00:00 +0000
- title: ' Optimal Design of Stochastic DNA Synthesis Protocols based on Generative Sequence Models '
  abstract: ' Generative probabilistic models of biological sequences have widespread existing and potential applications in analyzing, predicting and designing proteins, RNA and genomes. To test the predictions of such a model experimentally, the standard approach is to draw samples, and then synthesize each sample individually in the laboratory. However, often orders of magnitude more sequences can be experimentally assayed than can be affordably synthesized individually. In this article, we propose instead to use stochastic synthesis methods, such as mixed nucleotides or trimers. We describe a black-box algorithm for optimizing stochastic synthesis protocols to produce approximate samples from any target generative model. We establish theoretical bounds on the method’s performance, and validate it in simulation using held-out sequence-to-function predictors trained on real experimental data. We show that using optimized stochastic synthesis protocols in place of individual synthesis can increase the number of hits in protein engineering efforts by orders of magnitude, e.g. from zero to a thousand. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/weinstein22a.html
  PDF: https://proceedings.mlr.press/v151/weinstein22a/weinstein22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-weinstein22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Eli N.
    family: Weinstein
  - given: Alan N.
    family: Amin
  - given: Will S.
    family: Grathwohl
  - given: Daniel
    family: Kassler
  - given: Jean
    family: Disset
  - given: Debora
    family: Marks
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7450-7482
  id: weinstein22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7450
  lastpage: 7482
  published: 2022-05-03 00:00:00 +0000
- title: ' Approximate Top-$m$ Arm Identification with Heterogeneous Reward Variances '
  abstract: ' We study the effect of reward variance heterogeneity in the approximate top-$m$ arm identification setting. In this setting, the reward for the $i$-th arm follows a $\sigma^2_i$-sub-Gaussian distribution, and the agent needs to incorporate this knowledge to minimize the expected number of arm pulls to identify $m$ arms with the largest means within error $\epsilon$ out of the $n$ arms, with probability at least $1-\delta$. We show that the worst-case sample complexity of this problem is $$\Theta\left( \sum_{i =1}^n \frac{\sigma_i^2}{\epsilon^2} \ln\frac{1}{\delta} + \sum_{i \in G^{m}} \frac{\sigma_i^2}{\epsilon^2} \ln(m) + \sum_{j \in G^{l}} \frac{\sigma_j^2}{\epsilon^2} \text{Ent}(\sigma^2_{G^{r}}) \right), $$where $G^{m}, G^{l}, G^{r}$ are certain specific subsets of the overall arm set $\{1, 2, \ldots, n\}$, and $\text{Ent}(\cdot)$ is an entropy-like function which measures the heterogeneity of the variance proxies. The upper bound of the complexity is obtained using a divide-and-conquer style algorithm, while the matching lower bound relies on the study of a dual formulation. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/zhou22c.html
  PDF: https://proceedings.mlr.press/v151/zhou22c/zhou22c.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-zhou22c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ruida
    family: Zhou
  - given: Chao
    family: Tian
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7483-7504
  id: zhou22c
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7483
  lastpage: 7504
  published: 2022-05-03 00:00:00 +0000
- title: ' On perfectness in Gaussian graphical models '
  abstract: ' Knowing when a graphical model perfectly encodes the conditional independence structure of a distribution is essential in applications, and this is particularly important when performing inference from data. When the model is perfect, there is a one-to-one correspondence between conditional independence statements in the distribution and separation statements in the graph. Previous work has shown that almost all models based on linear directed acyclic graphs as well as Gaussian chain graphs are perfect, the latter of which subsumes Gaussian graphical models (i.e., the undirected Gaussian models) as a special case. In this paper, we directly approach the problem of perfectness for the Gaussian graphical models, and provide a new proof, via a more transparent parametrization, that almost all such models are perfect. Our approach is based on, and substantially extends, a construction of Lněnička and Matúš showing the existence of a perfect Gaussian distribution for any graph. The analysis involves constructing a probability measure on the set of normalized covariance matrices Markov with respect to a graph that may be of independent interest. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/amini22a.html
  PDF: https://proceedings.mlr.press/v151/amini22a/amini22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-amini22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Arash
    family: Amini
  - given: Bryon
    family: Aragam
  - given: Qing
    family: Zhou
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7505-7517
  id: amini22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7505
  lastpage: 7517
  published: 2022-05-03 00:00:00 +0000
- title: ' GalilAI: Out-of-Task Distribution Detection using Causal Active Experimentation for Safe Transfer RL '
  abstract: ' Out-of-distribution (OOD) detection is a well-studied topic in supervised learning. Extending the successes in supervised learning methods to the reinforcement learning (RL) setting, however, is difficult due to the data generating process - RL agents actively query their environment for data and this data is a function of the policy followed by the agent. Thus, an agent could neglect a shift in the environment if its policy did not lead it to explore the aspect of the environment that shifted. Therefore, to achieve safe and robust generalization in RL, there exists an unmet need for OOD detection through active experimentation. Here, we attempt to bridge this lacuna by first - defining a causal framework for OOD scenarios or environments encountered by RL agents in the wild. Then, we propose a novel task - that of Out-of-Task Distribution (OOTD) detection. We introduce an RL agent which actively experiments in a test environment and subsequently concludes whether it is OOTD or not. We name our method GalilAI, in honor of Galileo Galilei, as it also discovers, among other causal processes, that gravitational acceleration is independent of the mass of a body. Finally, we propose a simple probabilistic neural network baseline for comparison, which extends extant Model-Based RL. We find that our method outperforms the baseline significantly. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/sontakke22a.html
  PDF: https://proceedings.mlr.press/v151/sontakke22a/sontakke22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-sontakke22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Sumedh A.
    family: Sontakke
  - given: Stephen
    family: Iota
  - given: Zizhao
    family: Hu
  - given: Arash
    family: Mehrjou
  - given: Laurent
    family: Itti
  - given: Bernhard
    family: Schölkopf
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7518-7530
  id: sontakke22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7518
  lastpage: 7530
  published: 2022-05-03 00:00:00 +0000
- title: ' Efficient interventional distribution learning in the PAC framework '
  abstract: ' We consider the problem of efficiently inferring interventional distributions in a causal Bayesian network from a finite number of observations. Let P be a causal model on a set V of observable variables on a given causal graph G. For sets $X,Y \subseteq V$, and setting x to $X$, $P_x(Y)$ denotes the interventional distribution on Y with respect to an intervention x to variables X. Shpitser and Pearl (AAAI 2006), building on the work of Tian and Pearl (AAAI 2001), proved that the ID algorithm is sound and complete for recovering P_x(Y) from observations. We give the first provably efficient version of the ID algorithm. In particular, under natural assumptions, we give a polynomial-time algorithm that on input a causal graph G on observable variables V, a setting x of a set $X \subseteq V$ of bounded size, outputs succinct descriptions of both an evaluator and a generator for a distribution $\hat{P}$ that is epsilon-close (in total variation distance) to $P_x(Y)$ where $Y = V  X$, if $P_x(Y)$ is identifiable. We also show that when Y is an arbitrary subset of $V  X$, there is no efficient algorithm that outputs an evaluator of a distribution that is epsilon-close to $P_x(Y)$ unless all problems that have statistical zero-knowledge proofs, including the Graph Isomorphism problem, have efficient randomized algorithms. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/bhattacharyya22a.html
  PDF: https://proceedings.mlr.press/v151/bhattacharyya22a/bhattacharyya22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-bhattacharyya22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Arnab
    family: Bhattacharyya
  - given: Sutanu
    family: Gayen
  - given: Saravanan
    family: Kandasamy
  - given: Vedant
    family: Raval
  - given: Vinodchandran N.
    family: Variyam
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7531-7549
  id: bhattacharyya22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7531
  lastpage: 7549
  published: 2022-05-03 00:00:00 +0000
- title: ' Dropout as a Regularizer of Interaction Effects '
  abstract: ' We examine Dropout through the perspective of interactions. This view provides a symmetry to explain Dropout: given N variables, there are N choose k possible sets of k variables to form an interaction (i.e. O(N^k)); conversely, the probability an interaction of k variables survives Dropout at rate p is (1-p)^k (decaying with k). These rates effectively cancel, and so Dropout regularizes against higher-order interactions. We prove this perspective analytically and empirically. This perspective of Dropout as a regularizer against interaction effects has several practical implications: (1) higher Dropout rates should be used when we need stronger regularization against spurious high-order interactions, (2) caution should be exercised when interpreting Dropout-based explanations and uncertainty measures, and (3) networks trained with Input Dropout are biased estimators. We also compare Dropout to other regularizers and find that it is difficult to obtain the same selective pressure against high-order interactions with these methods. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/lengerich22a.html
  PDF: https://proceedings.mlr.press/v151/lengerich22a/lengerich22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-lengerich22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Benjamin J.
    family: Lengerich
  - given: Eric
    family: Xing
  - given: Rich
    family: Caruana
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7550-7564
  id: lengerich22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7550
  lastpage: 7564
  published: 2022-05-03 00:00:00 +0000
- title: ' Thompson Sampling with a Mixture Prior '
  abstract: ' We study Thompson sampling (TS) in online decision making, where the uncertain environment is sampled from a mixture distribution. This is relevant in multi-task learning, where a learning agent faces different classes of problems. We incorporate this structure in a natural way by initializing TS with a mixture prior, and call the resulting algorithm MixTS. To analyze MixTS, we develop a novel and general proof technique for analyzing the concentration of mixture distributions. We use it to derive Bayes regret bounds for MixTS in both linear bandits and finite-horizon reinforcement learning (RL). Our regret bounds reflect the structure of the mixture prior, and depend on the number of mixture components and their width. We demonstrate the empirical effectiveness of MixTS in synthetic and real-world experiments. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/hong22b.html
  PDF: https://proceedings.mlr.press/v151/hong22b/hong22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-hong22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Joey
    family: Hong
  - given: Branislav
    family: Kveton
  - given: Manzil
    family: Zaheer
  - given: Mohammad
    family: Ghavamzadeh
  - given: Craig
    family: Boutilier
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7565-7586
  id: hong22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7565
  lastpage: 7586
  published: 2022-05-03 00:00:00 +0000
- title: ' SparseFed: Mitigating Model Poisoning Attacks in Federated Learning with Sparsification '
  abstract: ' Federated learning is inherently vulnerable to model poisoning attacks because its decentralized nature allows attackers to participate with compromised devices. In model poisoning attacks, the attacker reduces the model’s performance on targeted sub-tasks (e.g. classifying planes as birds) by uploading "poisoned" updates. In this paper we introduce SparseFed, a novel defense that uses global top-k update sparsification and device-level gradient clipping to mitigate model poisoning attacks. We propose a theoretical framework for analyzing the robustness of defenses against poisoning attacks, and provide robustness and convergence analysis of our algorithm. To validate its empirical efficacy we conduct an open-source evaluation at scale across multiple benchmark datasets for computer vision and federated learning. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/panda22a.html
  PDF: https://proceedings.mlr.press/v151/panda22a/panda22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-panda22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ashwinee
    family: Panda
  - given: Saeed
    family: Mahloujifar
  - given: Arjun
    family: Nitin Bhagoji
  - given: Supriyo
    family: Chakraborty
  - given: Prateek
    family: Mittal
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7587-7624
  id: panda22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7587
  lastpage: 7624
  published: 2022-05-03 00:00:00 +0000
- title: ' Nearly Optimal Algorithms for Level Set Estimation '
  abstract: ' The level set estimation problem seeks to find all points in a domain $\mathcal{X}$ where the value of an unknown function $f:\mathcal{X}\rightarrow \mathbb{R}$ exceeds a threshold $\alpha$. The estimation is based on noisy function evaluations that may be acquired at sequentially and adaptively chosen locations in $\mathcal{X}$. The threshold value $\alpha$ can either be explicit and provided a priori, or implicit and defined relative to the optimal function value, i.e. $\alpha = (1-\epsilon)f(\mathbf{x}_\ast)$ for a given $\epsilon > 0$ where $f(\mathbf{x}_\ast)$ is the maximal function value and is unknown. In this work we provide a new approach to the level set estimation problem by relating it to recent adaptive experimental design methods for linear bandits in the Reproducing Kernel Hilbert Space (RKHS) setting. We assume that $f$ can be approximated by a function in the RKHS up to an unknown misspecification and provide novel algorithms for both the implicit and explicit cases in this setting with strong theoretical guarantees. Moreover, in the linear (kernel) setting, we show that our bounds are nearly optimal, namely, our upper bounds match existing lower bounds for threshold linear bandits. To our knowledge this work provides the first instance-dependent, non-asymptotic upper bounds on sample complexity of level-set estimation that match information theoretic lower bounds. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/mason22a.html
  PDF: https://proceedings.mlr.press/v151/mason22a/mason22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-mason22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Blake
    family: Mason
  - given: Lalit
    family: Jain
  - given: Subhojyoti
    family: Mukherjee
  - given: Romain
    family: Camilleri
  - given: Kevin
    family: Jamieson
  - given: Robert
    family: Nowak
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7625-7658
  id: mason22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7625
  lastpage: 7658
  published: 2022-05-03 00:00:00 +0000
- title: ' Near-optimal Local Convergence of Alternating Gradient Descent-Ascent for Minimax Optimization '
  abstract: ' Smooth minimax games often proceed by simultaneous or alternating gradient updates. Although algorithms with alternating updates are commonly used in practice, the majority of existing theoretical analyses focus on simultaneous algorithms for convenience of analysis. In this paper, we study alternating gradient descent-ascent (Alt-GDA) in minimax games and show that Alt-GDA is superior to its simultaneous counterpart (Sim-GDA) in many settings. We prove that Alt-GDA achieves a near-optimal local convergence rate for strongly convex-strongly concave (SCSC) problems while Sim-GDA converges at a much slower rate. To our knowledge, this is the first result of any setting showing that Alt-GDA converges faster than Sim-GDA by more than a constant. We further adapt the theory of integral quadratic constraints (IQC) and show that Alt-GDA attains the same rate globally for a subclass of SCSC minimax problems. Empirically, we demonstrate that alternating updates speed up GAN training significantly and the use of optimism only helps for simultaneous algorithms. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/zhang22e.html
  PDF: https://proceedings.mlr.press/v151/zhang22e/zhang22e.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-zhang22e.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Guodong
    family: Zhang
  - given: Yuanhao
    family: Wang
  - given: Laurent
    family: Lessard
  - given: Roger B.
    family: Grosse
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7659-7679
  id: zhang22e
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7659
  lastpage: 7679
  published: 2022-05-03 00:00:00 +0000
- title: ' Optimal Compression of Locally Differentially Private Mechanisms '
  abstract: ' Compressing the output of $\epsilon$-locally differentially private (LDP) randomizers naively leads to suboptimal utility. In this work, we demonstrate the benefits of using schemes that jointly compress and privatize the data using shared randomness. In particular, we investigate a family of schemes based on Minimal Random Coding (Havasi et al., 2019) and prove that they offer optimal privacy-accuracy-communication tradeoffs. Our theoretical and empirical findings show that our approach can compress PrivUnit (Bhowmick et al., 2018) and Subset Selection (Ye et al., 2018), the best known LDP algorithms for mean and frequency estimation, to the order of $\epsilon$ bits of communication while preserving their privacy and accuracy guarantees. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/shah22b.html
  PDF: https://proceedings.mlr.press/v151/shah22b/shah22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-shah22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Abhin
    family: Shah
  - given: Wei-Ning
    family: Chen
  - given: Johannes
    family: Ballé
  - given: Peter
    family: Kairouz
  - given: Lucas
    family: Theis
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7680-7723
  id: shah22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7680
  lastpage: 7723
  published: 2022-05-03 00:00:00 +0000
- title: ' Hierarchical Bayesian Bandits '
  abstract: ' Meta-, multi-task, and federated learning can be all viewed as solving similar tasks, drawn from a distribution that reflects task similarities. We provide a unified view of all these problems, as learning to act in a hierarchical Bayesian bandit. We propose and analyze a natural hierarchical Thompson sampling algorithm (HierTS) for this class of problems. Our regret bounds hold for many variants of the problems, including when the tasks are solved sequentially or in parallel; and show that the regret decreases with a more informative prior. Our proofs rely on a novel total variance decomposition that can be applied beyond our models. Our theory is complemented by experiments, which show that the hierarchy helps with knowledge sharing among the tasks. This confirms that hierarchical Bayesian bandits are a universal and statistically-efficient tool for learning to act with similar bandit tasks. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/hong22c.html
  PDF: https://proceedings.mlr.press/v151/hong22c/hong22c.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-hong22c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Joey
    family: Hong
  - given: Branislav
    family: Kveton
  - given: Manzil
    family: Zaheer
  - given: Mohammad
    family: Ghavamzadeh
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7724-7741
  id: hong22c
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7724
  lastpage: 7741
  published: 2022-05-03 00:00:00 +0000
- title: ' Complex Momentum for Optimization in Games '
  abstract: ' We generalize gradient descent with momentum for optimization in differentiable games to have complex-valued momentum. We give theoretical motivation for our method by proving convergence on bilinear zero-sum games for simultaneous and alternating updates. Our method gives real-valued parameter updates, making it a drop-in replacement for standard optimizers. We empirically demonstrate that complex-valued momentum can improve convergence in realistic adversarial games–like generative adversarial networks– by showing we can find better solutions with an almost identical computational cost. We also show a practical complex-valued Adam variant, which we use to train BigGAN to improve inception scores on CIFAR-10. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/lorraine22a.html
  PDF: https://proceedings.mlr.press/v151/lorraine22a/lorraine22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-lorraine22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jonathan P.
    family: Lorraine
  - given: David
    family: Acuna
  - given: Paul
    family: Vicol
  - given: David
    family: Duvenaud
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7742-7765
  id: lorraine22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7742
  lastpage: 7765
  published: 2022-05-03 00:00:00 +0000
- title: ' Asymptotically Optimal Locally Private Heavy Hitters via Parameterized Sketches '
  abstract: ' We study the frequency estimation problem under the local differential privacy model. Frequency estimation is a fundamental computational question, and differential privacy has become the de-facto standard, with the local version (LDP) affording even greater protection. On large input domains, sketching methods and hierarchical search methods are commonly and successfully, in practice, applied for reducing the size of the domain, and for identifying frequent elements. It is therefore of interest whether the current theoretical analysis of such algorithms is tight, or whether we can obtain algorithms in a similar vein that achieve optimal error guarantee. We introduce two algorithms for LDP frequency estimation. One solves the fundamental frequency oracle problem; the other solves the well-known heavy hitters identification problem. As a function of failure probability, \ensuremath{\beta}, the former achieves optimal worst-case estimation error for every \ensuremath{\beta}; the latter is optimal when \ensuremath{\beta} is at least inverse polynomial in n, the number of users. In each algorithm, server running time and memory usage are tilde{O}(n) and tilde{O}(sqrt{n}), respectively, while user running time and memory usage are both tilde{O}(1). Our frequency-oracle algorithm achieves lower estimation error than Bassily et al. (NeurIPS 2017). On the other hand, our heavy hitters identification method improves the worst-case error of TreeHist (ibid) by a factor of Omega(sqrt{log n}); it avoids invoking error-correcting codes, known to be theoretically powerful, but yet to be implemented. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/wu22e.html
  PDF: https://proceedings.mlr.press/v151/wu22e/wu22e.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-wu22e.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Hao
    family: Wu
  - given: Anthony
    family: Wirth
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7766-7798
  id: wu22e
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7766
  lastpage: 7798
  published: 2022-05-03 00:00:00 +0000
- title: ' Tuning-Free Generalized Hamiltonian Monte Carlo '
  abstract: ' Hamiltonian Monte Carlo (HMC) has become a go-to family of Markov chain Monte Carlo (MCMC) algorithms for Bayesian inference problems, in part because we have good procedures for automatically tuning its parameters. Much less attention has been paid to automatic tuning of generalized HMC (GHMC), in which the auxiliary momentum vector is partially updated frequently instead of being completely resampled infrequently. Since GHMC spreads progress over many iterations, it is not straightforward to tune GHMC based on quantities typically used to tune HMC such as average acceptance rate and squared jumped distance. In this work, we propose an ensemble-chain adaptation (ECA) algorithm for GHMC that automatically selects values for all of GHMC’s tunable parameters each iteration based on statistics collected from a population of many chains. This algorithm is designed to make good use of SIMD hardware accelerators such as GPUs, allowing most chains to be updated in parallel each iteration. Unlike typical adaptive-MCMC algorithms, our ECA algorithm does not perturb the chain’s stationary distribution, and therefore does not need to be “frozen” after warmup. Empirically, we find that the proposed algorithm quickly converges to its stationary distribution, producing accurate estimates of posterior expectations with relatively few gradient evaluations per chain. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/hoffman22a.html
  PDF: https://proceedings.mlr.press/v151/hoffman22a/hoffman22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-hoffman22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Matthew D.
    family: Hoffman
  - given: Pavel
    family: Sountsov
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7799-7813
  id: hoffman22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7799
  lastpage: 7813
  published: 2022-05-03 00:00:00 +0000
- title: ' Federated Functional Gradient Boosting '
  abstract: ' Motivated by the tremendous success of boosting methods in the standard centralized model of learning, we initiate the theory of boosting in the Federated Learning setting. The primary challenges in the Federated Learning setting are heterogeneity in client data and the requirement that no client data can be transmitted to the server. We develop federated functional gradient boosting (FFGB) an algorithm that is designed to handle these challenges. Under appropriate assumptions on the weak learning oracle, the FFGB algorithm is proved to efficiently converge to certain neighborhoods of the global optimum. The radii of these neighborhoods depend upon the level of heterogeneity measured via the total variation distance and the much tighter Wasserstein-1 distance, and diminish to zero as the setting becomes more homogeneous. In practice, as suggested by our theoretical findings, we propose using FFGB to warm-start existing Federated Learning solvers and observe significant performance boost in highly heterogeneous settings. The code can be found here. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/shen22a.html
  PDF: https://proceedings.mlr.press/v151/shen22a/shen22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-shen22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Zebang
    family: Shen
  - given: Hamed
    family: Hassani
  - given: Satyen
    family: Kale
  - given: Amin
    family: Karbasi
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7814-7840
  id: shen22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7814
  lastpage: 7840
  published: 2022-05-03 00:00:00 +0000
- title: ' Generalised GPLVM with Stochastic Variational Inference '
  abstract: ' Gaussian process latent variable models (GPLVM) are a flexible and non-linear approach to dimensionality reduction, extending classical Gaussian processes to an unsupervised learning context. The Bayesian incarnation of the GPLVM uses a variational framework, where the posterior over latent variables is approximated by a well-behaved variational family, a factorised Gaussian yielding a tractable lower bound. However, the non-factorisability of the lower bound prevents truly scalable inference. In this work, we study the doubly stochastic formulation of the Bayesian GPLVM model amenable with minibatch training. We show how this framework is compatible with different latent variable formulations and perform experiments to compare a suite of models. Further, we demonstrate how we can train in the presence of massively missing data and obtain high-fidelity reconstructions. We demonstrate the model’s performance by benchmarking against the canonical sparse GPLVM for high dimensional data examples. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/lalchand22a.html
  PDF: https://proceedings.mlr.press/v151/lalchand22a/lalchand22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-lalchand22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Vidhi
    family: Lalchand
  - given: Aditya
    family: Ravuri
  - given: Neil D.
    family: Lawrence
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7841-7864
  id: lalchand22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7841
  lastpage: 7864
  published: 2022-05-03 00:00:00 +0000
- title: ' Stochastic Extragradient: General Analysis and Improved Rates '
  abstract: ' The Stochastic Extragradient (SEG) method is one of the most popular algorithms for solving min-max optimization and variational inequalities problems (VIP) appearing in various machine learning tasks. However, several important questions regarding the convergence properties of SEG are still open, including the sampling of stochastic gradients, mini-batching, convergence guarantees for the monotone finite-sum variational inequalities with possibly non-monotone terms, and others. To address these questions, in this paper, we develop a novel theoretical framework that allows us to analyze several variants of SEG in a unified manner. Besides standard setups, like Same-Sample SEG under Lipschitzness and monotonicity or Independent-Samples SEG under uniformly bounded variance, our approach allows us to analyze variants of SEG that were never explicitly considered in the literature before. Notably, we analyze SEG with arbitrary sampling which includes importance sampling and various mini-batching strategies as special cases. Our rates for the new variants of SEG outperform the current state-of-the-art convergence guarantees and rely on less restrictive assumptions. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/gorbunov22b.html
  PDF: https://proceedings.mlr.press/v151/gorbunov22b/gorbunov22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-gorbunov22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Eduard
    family: Gorbunov
  - given: Hugo
    family: Berard
  - given: Gauthier
    family: Gidel
  - given: Nicolas
    family: Loizou
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7865-7901
  id: gorbunov22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7865
  lastpage: 7901
  published: 2022-05-03 00:00:00 +0000
- title: ' Deep Non-crossing Quantiles through the Partial Derivative '
  abstract: ' Quantile Regression (QR) provides a way to approximate a single conditional quantile. To have a more informative description of the conditional distribution, QR can be merged with deep learning techniques to simultaneously estimate multiple quantiles. However, the minimisation of the QR-loss function does not guarantee non-crossing quantiles, which affects the validity of such predictions and introduces a critical issue in certain scenarios. In this article, we propose a generic deep learning algorithm for predicting an arbitrary number of quantiles that ensures the quantile monotonicity constraint up to the machine precision and maintains its modelling performance with respect to alternative models. The presented method is evaluated over several real-world datasets obtaining state-of-the-art results as well as showing that it scales to large-size data sets. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/brando22a.html
  PDF: https://proceedings.mlr.press/v151/brando22a/brando22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-brando22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Axel
    family: Brando
  - given: Barcelona Supercomputing
    family: Center
  - given: )*;
    prefix: and
    family: Joan Gimeno
  - given: Jose
    family: Rodriguez-Serrano
  - given: Jordi
    family: Vitria
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7902-7914
  id: brando22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7902
  lastpage: 7914
  published: 2022-05-03 00:00:00 +0000
- title: ' Optimal channel selection with discrete QCQP '
  abstract: ' Reducing the high computational cost of large convolutional neural networks is crucial when deploying the networks to resource-constrained environments. We first show the greedy approach of recent channel pruning methods ignores the inherent quadratic coupling between channels in the neighboring layers and cannot safely remove inactive weights during the pruning procedure. Furthermore, due to these inactive weights, the greedy methods cannot guarantee to satisfy the given resource constraints and deviate with the true objective. In this regard, we propose a novel channel selection method that optimally selects channels via discrete QCQP, which provably prevents any inactive weights and guarantees to meet the resource constraints tightly in terms of FLOPs, memory usage, and network size. We also propose a quadratic model that accurately estimates the actual inference time of the pruned network, which allows us to adopt inference time as a resource constraint option. Furthermore, we generalize our method to extend the selection granularity beyond channels and handle non-sequential connections. Our experiments on CIFAR-10 and ImageNet show our proposed pruning method outperforms other fixed-importance channel pruning methods on various network architectures. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/jeong22a.html
  PDF: https://proceedings.mlr.press/v151/jeong22a/jeong22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-jeong22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yeonwoo
    family: Jeong
  - given: Deokjae
    family: Lee
  - given: Gaon
    family: An
  - given: Changyong
    family: Son
  - given: Hyun
    family: Oh Song
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7915-7941
  id: jeong22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7915
  lastpage: 7941
  published: 2022-05-03 00:00:00 +0000
- title: ' Vanishing Curvature in Randomly Initialized Deep ReLU Networks '
  abstract: ' Deep ReLU networks are at the basis of many modern neural architectures. Yet, the loss landscape of such networks and its interaction with state-of-the-art optimizers is not fully understood. One of the most crucial aspects is the landscape at random initialization, which often influences convergence speed dramatically. In their seminal works, Xavier & Bengio, 2010 and He et al., 2015 propose an initialization strategy that is supposed to prevent gradients from vanishing. Yet, we identify some shortcomings of their expectation analysis as network depth increases, and show that the proposed initialization can actually fail to deliver stable gradient norms. More precisely, by leveraging an in-depth analysis of the median of the forward pass, we first show that, with high probability, vanishing gradients cannot be circumvented when the network width scales with less than O(depth). Second, we extend this analysis to second-order derivatives and show that random i.i.d. initialization also gives rise to Hessian matrices with eigenspectra that vanish as networks grow in depth. Whenever this happens, optimizers are initialized in a very flat, saddle point-like plateau, which is particularly hard to escape with stochastic gradient descent (SGD) as its escaping time is inversely related to curvature magnitudes. We believe that this observation is crucial for fully understanding (a) the historical difficulties of training deep nets with vanilla SGD and (b) the success of adaptive gradient methods, which naturally adapt to curvature and thus quickly escape flat plateaus. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/orvieto22a.html
  PDF: https://proceedings.mlr.press/v151/orvieto22a/orvieto22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-orvieto22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Antonio
    family: Orvieto
  - given: Jonas
    family: Kohler
  - given: Dario
    family: Pavllo
  - given: Thomas
    family: Hofmann
  - given: Aurelien
    family: Lucchi
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7942-7975
  id: orvieto22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7942
  lastpage: 7975
  published: 2022-05-03 00:00:00 +0000
- title: ' Fast and Scalable Spike and Slab Variable Selection in High-Dimensional Gaussian Processes '
  abstract: ' Variable selection in Gaussian processes (GPs) is typically undertaken by thresholding the inverse lengthscales of automatic relevance determination kernels, but in high-dimensional datasets this approach can be unreliable. A more probabilistically principled alternative is to use spike and slab priors and infer a posterior probability of variable inclusion. However, existing implementations in GPs are very costly to run in both high-dimensional and large-n datasets, or are only suitable for unsupervised settings with specific kernels. As such, we develop a fast and scalable variational inference algorithm for the spike and slab GP that is tractable with arbitrary differentiable kernels. We improve our algorithm’s ability to adapt to the sparsity of relevant variables by Bayesian model averaging over hyperparameters, and achieve substantial speed ups using zero temperature posterior restrictions, dropout pruning and nearest neighbour minibatching. In experiments our method consistently outperforms vanilla and sparse variational GPs whilst retaining similar runtimes (even when n=10^6) and performs competitively with a spike and slab GP using MCMC but runs up to 1000 times faster. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/dance22a.html
  PDF: https://proceedings.mlr.press/v151/dance22a/dance22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-dance22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Hugh
    family: Dance
  - given: Brooks
    family: Paige
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 7976-8002
  id: dance22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 7976
  lastpage: 8002
  published: 2022-05-03 00:00:00 +0000
- title: ' Self-training Converts Weak Learners to Strong Learners in Mixture Models '
  abstract: ' We consider a binary classification problem when the data comes from a mixture of two rotationally symmetric distributions satisfying concentration and anti-concentration properties enjoyed by log-concave distributions among others. We show that there exists a universal constant $C_{\mathrm{err}}>0$ such that if a pseudolabeler $\beta_{\mathrm{pl}}$ can achieve classification error at most $C_{\mathrm{err}}$, then for any $\varepsilon>0$, an iterative self-training algorithm initialized at $\beta_0 := \beta_{\mathrm{pl}}$ using pseudolabels $\hat y = \mathrm{sgn}(⟨\beta_t, \xb⟩)$ and using at most $\tilde O(d/\varepsilon^2)$ unlabeled examples suffices to learn the Bayes-optimal classifier up to $\varepsilon$ error, where $d$ is the ambient dimension. That is, self-training converts weak learners to strong learners using only unlabeled examples. We additionally show that by running gradient descent on the logistic loss one can obtain a pseudolabeler $\beta_{\mathrm{pl}}$ with classification error $C_{\mathrm{err}}$ using only $O(d)$ labeled examples (i.e., independent of $\varepsilon$). Together our results imply that mixture models can be learned to within $\varepsilon$ of the Bayes-optimal accuracy using at most $O(d)$ labeled examples and $\tilde O(d/\varepsilon^2)$ unlabeled examples by way of a semi-supervised self-training algorithm. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/frei22a.html
  PDF: https://proceedings.mlr.press/v151/frei22a/frei22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-frei22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Spencer
    family: Frei
  - given: Difan
    family: Zou
  - given: Zixiang
    family: Chen
  - given: Quanquan
    family: Gu
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8003-8021
  id: frei22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8003
  lastpage: 8021
  published: 2022-05-03 00:00:00 +0000
- title: ' Two-Sample Test with Kernel Projected Wasserstein Distance '
  abstract: ' We develop a kernel projected Wasserstein distance for the two-sample test, an essential building block in statistics and machine learning: given two sets of samples, to determine whether they are from the same distribution. This method operates by finding the nonlinear mapping in the data space which maximizes the distance between projected distributions. In contrast to existing works about projected Wasserstein distance, the proposed method circumvents the curse of dimensionality more efficiently. We present practical algorithms for computing this distance function together with the non-asymptotic uncertainty quantification of empirical estimates. Numerical examples validate our theoretical results and demonstrate good performance of the proposed method. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/wang22f.html
  PDF: https://proceedings.mlr.press/v151/wang22f/wang22f.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-wang22f.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jie
    family: Wang
  - given: Rui
    family: Gao
  - given: Yao
    family: Xie
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8022-8055
  id: wang22f
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8022
  lastpage: 8055
  published: 2022-05-03 00:00:00 +0000
- title: ' Mode estimation on matrix manifolds: Convergence and robustness '
  abstract: ' Data on matrix manifolds are ubiquitous on a wide range of research fields. The key issue is estimation of the modes (i.e., maxima) of the probability density function underlying the data. For instance, local modes (i.e., local maxima) can be used for clustering, while the global mode (i.e., the global maximum) is a robust alternative to the Frechet mean. Previously, to estimate the modes, an iterative method has been proposed based on a Riemannian gradient estimator and empirically showed the superior performance in clustering (Ashizawa et al., 2017). However, it has not been theoretically investigated if the iterative method is able to capture the modes based on the gradient estimator. In this paper, we propose simple iterative methods for mode estimation on matrix manifolds based on the Euclidean metric. A key contribution is to perform theoretical analysis and establish sufficient conditions for the monotonic ascending and convergence of the proposed iterative methods. In addition, for the previous method, we prove the monotonic ascending property towards a mode. Thus, our work can be also regarded as compensating for the lack of theoretical analysis in the previous method. Furthermore, the robustness of the iterative methods is theoretically investigated in terms of the breakdown point. Finally, the proposed methods are experimentally demonstrated to work well in clustering and robust mode estimation on matrix manifolds. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/sasaki22a.html
  PDF: https://proceedings.mlr.press/v151/sasaki22a/sasaki22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-sasaki22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Hiroaki
    family: Sasaki
  - given: Jun-Ichiro
    family: Hirayama
  - given: Takafumi
    family: Kanamori
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8056-8079
  id: sasaki22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8056
  lastpage: 8079
  published: 2022-05-03 00:00:00 +0000
- title: ' A Dimensionality Reduction Method for Finding Least Favorable Priors with a Focus on Bregman Divergence '
  abstract: ' A common way of characterizing minimax estimators in point estimation is by moving the problem into the Bayesian estimation domain and finding a least favorable prior distribution. The Bayesian estimator induced by a least favorable prior, under mild conditions, is then known to be minimax. However, finding least favorable distributions can be challenging due to inherent optimization over the space of probability distributions, which is infinite-dimensional. This paper develops a dimensionality reduction method that allows us to move the optimization to a finite-dimensional setting with an explicit bound on the dimension. The benefit of this dimensionality reduction is that it permits the use of popular algorithms such as projected gradient ascent to find least favorable priors. Throughout the paper, in order to make progress on the problem, we restrict ourselves to Bayesian risks induced by a relatively large class of loss functions, namely Bregman divergences. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/dytso22a.html
  PDF: https://proceedings.mlr.press/v151/dytso22a/dytso22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-dytso22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Alex R.
    family: Dytso
  - given: Mario
    family: Goldenbaum
  - given: H.
    family: Vincent Poor
  - given: Shlomo
    family: Shamai
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8080-8094
  id: dytso22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8080
  lastpage: 8094
  published: 2022-05-03 00:00:00 +0000
- title: ' Towards Federated Bayesian Network Structure Learning with Continuous Optimization '
  abstract: ' Traditionally, Bayesian network structure learning is often carried out at a central site, in which all data is gathered. However, in practice, data may be distributed across different parties (e.g., companies, devices) who intend to collectively learn a Bayesian network, but are not willing to disclose information related to their data owing to privacy or security concerns. In this work, we present a federated learning approach to estimate the structure of Bayesian network from data that is horizontally partitioned across different parties. We develop a distributed structure learning method based on continuous optimization, using the alternating direction method of multipliers (ADMM), such that only the model parameters have to be exchanged during the optimization process. We demonstrate the flexibility of our approach by adopting it for both linear and nonlinear cases. Experimental results on synthetic and real datasets show that it achieves an improved performance over the other methods, especially when there is a relatively large number of clients and each has a limited sample size. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ng22a.html
  PDF: https://proceedings.mlr.press/v151/ng22a/ng22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ng22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ignavier
    family: Ng
  - given: Kun
    family: Zhang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8095-8111
  id: ng22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8095
  lastpage: 8111
  published: 2022-05-03 00:00:00 +0000
- title: ' Nuances in Margin Conditions Determine Gains in Active Learning '
  abstract: ' We consider nonparametric classification with smooth regression functions, where it is well known that notions of margin in E[Y|X] determine fast or slow rates in both active and passive learning. Here we elucidate a striking distinction between the two settings. Namely, we show that some seemingly benign nuances in notions of margin - involving the uniqueness of the Bayes classifier, and which have no apparent effect on rates in passive learning - determine whether or not any active learner can outperform passive learning rates. In particular, for Audibert-Tsybakov’s margin condition (allowing general situations with non-unique Bayes classifiers), no active learner can gain over passive learning in commonly studied settings where the marginal on X is near uniform. Our results thus negate the usual intuition from past literature that active rates should improve over passive rates in nonparametric settings. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/kpotufe22a.html
  PDF: https://proceedings.mlr.press/v151/kpotufe22a/kpotufe22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-kpotufe22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Samory
    family: Kpotufe
  - given: Gan
    family: Yuan
  - given: Yunfan
    family: Zhao
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8112-8126
  id: kpotufe22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8112
  lastpage: 8126
  published: 2022-05-03 00:00:00 +0000
- title: ' Learning Quantile Functions without Quantile Crossing for Distribution-free Time Series Forecasting '
  abstract: ' Quantile regression is an effective technique to quantify uncertainty, fit challenging underlying distributions, and often provide full probabilistic predictions through joint learnings over multiple quantile levels. A common drawback of these joint quantile regressions, however, is quantile crossing, which violates the desirable monotone property of the conditional quantile function. In this work, we propose the Incremental (Spline) Quantile Functions I(S)QF, a flexible and efficient distribution-free quantile estimation framework that resolves quantile crossing with a simple neural network layer. Moreover, I(S)QF inter/extrapolate to predict arbitrary quantile levels that differ from the underlying training ones. Equipped with the analytical evaluation of the continuous ranked probability score of I(S)QF representations, we apply our methods to NN-based times series forecasting cases, where the savings of the expensive re-training costs for non-trained quantile levels is particularly significant. We also provide a generalization error analysis of our proposed approaches under the sequence-to-sequence setting. Lastly, extensive experiments demonstrate the improvement of consistency and accuracy errors over other baselines. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/park22a.html
  PDF: https://proceedings.mlr.press/v151/park22a/park22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-park22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Youngsuk
    family: Park
  - given: Danielle
    family: Maddix
  - given: François-Xavier
    family: Aubet
  - given: Kelvin
    family: Kan
  - given: Jan
    family: Gasthaus
  - given: Yuyang
    family: Wang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8127-8150
  id: park22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8127
  lastpage: 8150
  published: 2022-05-03 00:00:00 +0000
- title: ' Convergence of Langevin Monte Carlo in Chi-Squared and Rényi Divergence '
  abstract: ' We study sampling from a target distribution $\nu_* = e^{-f}$ using the unadjusted Langevin Monte Carlo (LMC) algorithm when the potential $f$ satisfies a strong dissipativity condition and it is first-order smooth with a Lipschitz gradient. We prove that, initialized with a Gaussian random vector that has sufficiently small variance, iterating the LMC algorithm for $\widetilde{\mathcal{O}}(\lambda^2 d\epsilon^{-1})$ steps is sufficient to reach $\epsilon$-neighborhood of the target in both Chi-squared and Rényi divergence, where $\lambda$ is the logarithmic Sobolev constant of $\nu_*$. Our results do not require warm-start to deal with the exponential dimension dependency in Chi-squared divergence at initialization. In particular, for strongly convex and first-order smooth potentials, we show that the LMC algorithm achieves the rate estimate $\widetilde{\mathcal{O}}(d\epsilon^{-1})$ which improves the previously known rates in both of these metrics, under the same assumptions. Translating this rate to other metrics, our results also recover the state-of-the-art rate estimates in KL divergence, total variation and $2$-Wasserstein distance in the same setup. Finally, as we rely on the logarithmic Sobolev inequality, our framework covers a range of non-convex potentials that are first-order smooth and exhibit strong convexity outside of a compact region. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/erdogdu22a.html
  PDF: https://proceedings.mlr.press/v151/erdogdu22a/erdogdu22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-erdogdu22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Murat A.
    family: Erdogdu
  - given: Rasa
    family: Hosseinzadeh
  - given: Shunshi
    family: Zhang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8151-8175
  id: erdogdu22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8151
  lastpage: 8175
  published: 2022-05-03 00:00:00 +0000
- title: ' On the Convergence of Continuous Constrained Optimization for Structure Learning '
  abstract: ' Recently, structure learning of directed acyclic graphs (DAGs) has been formulated as a continuous optimization problem by leveraging an algebraic characterization of acyclicity. The constrained problem is solved using the augmented Lagrangian method (ALM) which is often preferred to the quadratic penalty method (QPM) by virtue of its standard convergence result that does not require the penalty coefficient to go to infinity, hence avoiding ill-conditioning. However, the convergence properties of these methods for structure learning, including whether they are guaranteed to return a DAG solution, remain unclear, which might limit their practical applications. In this work, we examine the convergence of ALM and QPM for structure learning in the linear, nonlinear, and confounded cases. We show that the standard convergence result of ALM does not hold in these settings, and demonstrate empirically that its behavior is akin to that of the QPM which is prone to ill-conditioning. We further establish the convergence guarantee of QPM to a DAG solution, under mild conditions. Lastly, we connect our theoretical results with existing approaches to help resolve the convergence issue, and verify our findings in light of an empirical comparison of them. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ng22b.html
  PDF: https://proceedings.mlr.press/v151/ng22b/ng22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ng22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ignavier
    family: Ng
  - given: Sebastien
    family: Lachapelle
  - given: Nan
    family: Rosemary Ke
  - given: Simon
    family: Lacoste-Julien
  - given: Kun
    family: Zhang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8176-8198
  id: ng22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8176
  lastpage: 8198
  published: 2022-05-03 00:00:00 +0000
- title: ' Hardness of Learning a Single Neuron with Adversarial Label Noise '
  abstract: ' We study the problem of distribution-free learning of a single neuron under adversarial label noise with respect to the squared loss. For a wide range of activation functions, including ReLUs and sigmoids, we prove hardness of learning results in the Statistical Query model and under a well-studied assumption on the complexity of refuting XOR formulas. Specifically, we establish that no polynomial-time learning algorithm, even improper, can approximate the optimal loss value within any constant factor. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/diakonikolas22a.html
  PDF: https://proceedings.mlr.press/v151/diakonikolas22a/diakonikolas22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-diakonikolas22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ilias
    family: Diakonikolas
  - given: Daniel
    family: Kane
  - given: Pasin
    family: Manurangsi
  - given: Lisheng
    family: Ren
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8199-8213
  id: diakonikolas22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8199
  lastpage: 8213
  published: 2022-05-03 00:00:00 +0000
- title: ' Firebolt: Weak Supervision Under Weaker Assumptions '
  abstract: ' Modern machine learning demands a large amount of training data. Weak supervision is a promising approach to meet this demand. It aggregates multiple labeling functions (LFs)–noisy, user-provided labeling heuristics—to rapidly and cheaply curate probabilistic labels for large-scale unlabeled data. However, standard assumptions in weak supervision—such as user-specified class balance, similar accuracy of an LF in classifying different classes, and full knowledge of LF dependency at inference time—might be undesirable in practice. In response, we present Firebolt, a new weak supervision framework that seeks to operate under weaker assumptions. In particular, Firebolt learns the class balance and class-specific accuracy of LFs jointly from unlabeled data. It carries out inference in an efficient and interpretable manner. We analyze the parameter estimation error of Firebolt and characterize its impact on downstream model performance. Furthermore, we show that on five publicly available datasets, Firebolt outperforms a state-of-the-art weak supervision method by up to 5.8 points in AUC. We also provide a case study in the production setting of a tech company, where a Firebolt-supervised model outperforms the existing weakly-supervised production model by 1.3 points in AUC and speedup label model training and inference from one hour to three minutes. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/kuang22a.html
  PDF: https://proceedings.mlr.press/v151/kuang22a/kuang22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-kuang22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Zhaobin
    family: Kuang
  - given: Chidubem G.
    family: Arachie
  - given: Bangyong
    family: Liang
  - given: Pradyumna
    family: Narayana
  - given: Giulia
    family: Desalvo
  - given: Michael S.
    family: Quinn
  - given: Bert
    family: Huang
  - given: Geoffrey
    family: Downs
  - given: Yang
    family: Yang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8214-8259
  id: kuang22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8214
  lastpage: 8259
  published: 2022-05-03 00:00:00 +0000
- title: ' Relational Neural Markov Random Fields '
  abstract: ' Statistical Relational Learning (SRL) models have attracted significant attention due to their ability to model complex data while handling uncertainty. However, most of these models have been restricted to discrete domains owing to the complexity of inference in continuous domains. In this work, we introduce Relational Neural Markov Random Fields (RN-MRFs) that allow handling of complex relational hybrid domains, i.e., those that include discrete and continuous quantities, and we propose a maximum pseudolikelihood estimation-based learning algorithm with importance sampling for training the neural potential parameters. The key advantage of our approach is that it makes minimal data distributional assumptions and can seamlessly embed human knowledge through potentials or relational rules. Our empirical evaluations across diverse domains, such as image processing and relational object mapping, demonstrate its practical utility. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/chen22f.html
  PDF: https://proceedings.mlr.press/v151/chen22f/chen22f.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-chen22f.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yuqiao
    family: Chen
  - given: Sriraam
    family: Natarajan
  - given: Nicholas
    family: Ruozzi
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8260-8269
  id: chen22f
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8260
  lastpage: 8269
  published: 2022-05-03 00:00:00 +0000
- title: ' PACm-Bayes: Narrowing the Empirical Risk Gap in the Misspecified Bayesian Regime '
  abstract: ' The Bayesian posterior minimizes the "inferential risk" which itself bounds the "predictive risk." This bound is tight when the likelihood and prior are well-specified. How-ever since misspecification induces a gap,the Bayesian posterior predictive distribution may have poor generalization performance. This work develops a multi-sample loss (PAC$^m$) which can close the gap by spanning a trade-off between the two risks. The loss is computationally favorable and offers PAC generalization guarantees. Empirical study demonstrates improvement to the predictive distribution '
  volume: 151
  URL: https://proceedings.mlr.press/v151/morningstar22a.html
  PDF: https://proceedings.mlr.press/v151/morningstar22a/morningstar22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-morningstar22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Warren R.
    family: Morningstar
  - given: Alex
    family: Alemi
  - given: Joshua V.
    family: Dillon
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8270-8298
  id: morningstar22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8270
  lastpage: 8298
  published: 2022-05-03 00:00:00 +0000
- title: ' Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning Optimization Landscape '
  abstract: ' In this paper, we study the sharpness of a deep learning (DL) loss landscape around local minima in order to reveal systematic mechanisms underlying the generalization abilities of DL models. Our analysis is performed across varying network and optimizer hyper-parameters, and involves a rich family of different sharpness measures. We compare these measures and show that the low-pass filter based measure exhibits the highest correlation with the generalization abilities of DL models, has high robustness to both data and label noise, and furthermore can track the double descent behavior for neural networks. We next derive the optimization algorithm, relying on the low-pass filter (LPF), that actively searches the flat regions in the DL optimization landscape using SGD-like procedure. The update of the proposed algorithm, that we call LPF-SGD, is determined by the gradient of the convolution of the filter kernel with the loss function and can be efficiently computed using MC sampling. We empirically show that our algorithm achieves superior generalization performance compared to the common DL training strategies. On the theoretical front we prove that LPF-SGD converges to a better optimal point with smaller generalization error than SGD. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/bisla22a.html
  PDF: https://proceedings.mlr.press/v151/bisla22a/bisla22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-bisla22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Devansh
    family: Bisla
  - given: Jing
    family: Wang
  - given: Anna
    family: Choromanska
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8299-8339
  id: bisla22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8299
  lastpage: 8339
  published: 2022-05-03 00:00:00 +0000
- title: ' Online Control of the False Discovery Rate under "Decision Deadlines" '
  abstract: ' Online testing procedures aim to control the extent of false discoveries over a sequence of hypothesis tests, allowing for the possibility that early-stage test results influence the choice of hypotheses to be tested in later stages. Typically, online methods assume that a permanent decision regarding the current test (reject or not reject) must be made before advancing to the next test. We instead assume that each hypothesis requires an immediate preliminary decision, but also allows us to update that decision until a preset deadline. Roughly speaking, this lets us apply a Benjamini-Hochberg-type procedure over a moving window of hypotheses, where the threshold parameters for upcoming tests can be determined based on preliminary results. We show that our approach can control the false discovery rate (FDR) at every stage of testing, even under arbitrary p-value dependencies. That said, our approach offers much greater flexibility if the p-values exhibit a known independence structure. For example, if the p-value sequence is finite and all p-values are independent, then we can additionally control FDR at adaptively chosen stopping times. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/fisher22a.html
  PDF: https://proceedings.mlr.press/v151/fisher22a/fisher22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-fisher22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Aaron J.
    family: Fisher
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8340-8359
  id: fisher22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8340
  lastpage: 8359
  published: 2022-05-03 00:00:00 +0000
- title: ' Tile Networks: Learning Optimal Geometric Layout for Whole-page Recommendation '
  abstract: ' Finding optimal configurations in a geometric space is a key challenge in many technological disciplines. Current approaches either rely heavily on human domain expertise and are difficult to scale. In this paper we show it is possible to solve configuration optimization problems for whole-page recommendation using reinforcement learning. The proposed Tile Networks is a neural architecture that optimizes 2D geometric configurations by arranging items on proper positions. Empirical results on real dataset demonstrate its superior performance compared to traditional learning to rank approaches and recent deep models. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/xiao22a.html
  PDF: https://proceedings.mlr.press/v151/xiao22a/xiao22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-xiao22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Shuai
    family: Xiao
  - given: Zaifan
    family: Jiang
  - given: Shuang
    family: Yang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8360-8369
  id: xiao22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8360
  lastpage: 8369
  published: 2022-05-03 00:00:00 +0000
- title: ' A Bayesian Approach for Stochastic Continuum-armed Bandit with Long-term Constraints '
  abstract: ' Despite many valuable advances in the domain of online convex optimization over the last decade, many machine learning and networking problems of interest do not fit into that framework due to their nonconvex objectives and the presence of constraints. This motivates us in this paper to go beyond convexity and study the problem of stochastic continuum-armed bandit with long-term constraints. For noiseless observations of constraint functions, we propose a generic method using a Bayesian approach based on a class of penalty functions, and prove that it can achieve a sublinear regret with respect to the global optimum and a sublinear constraint violation (CV), which can match the best results of previous methods. Additionally, we propose another method to deal with the case where constraint functions are observed with noise, which can achieve a sublinear regret and a sublinear CV with more assumptions. Finally, we use two experiments to compare our methods with two benchmark methods in online optimization and Bayesian optimization, which demonstrates the advantages of our algorithms. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/shi22c.html
  PDF: https://proceedings.mlr.press/v151/shi22c/shi22c.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-shi22c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Zai
    family: Shi
  - given: Atilla
    family: Eryilmaz
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8370-8391
  id: shi22c
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8370
  lastpage: 8391
  published: 2022-05-03 00:00:00 +0000
- title: ' Amortized Rejection Sampling in Universal Probabilistic Programming '
  abstract: ' Naive approaches to amortized inference in probabilistic programs with unbounded loops can produce estimators with infinite variance. This is particularly true of importance sampling inference in programs that explicitly include rejection sampling as part of the user-programmed generative procedure. In this paper we develop a new and efficient amortized importance sampling estimator. We prove finite variance of our estimator and empirically demonstrate our method’s correctness and efficiency compared to existing alternatives on generative programs containing rejection sampling loops and discuss how to implement our method in a generic probabilistic programming framework. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/naderiparizi22a.html
  PDF: https://proceedings.mlr.press/v151/naderiparizi22a/naderiparizi22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-naderiparizi22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Saeid
    family: Naderiparizi
  - given: Adam
    family: Scibior
  - given: Andreas
    family: Munk
  - given: Mehrdad
    family: Ghadiri
  - given: Atilim
    family: Gunes Baydin
  - given: Bradley J.
    family: Gram-Hansen
  - given: Christian A.
    family: Schroeder De Witt
  - given: Robert
    family: Zinkov
  - given: Philip
    family: Torr
  - given: Tom
    family: Rainforth
  - given: Yee
    family: Whye Teh
  - given: Frank
    family: Wood
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8392-8412
  id: naderiparizi22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8392
  lastpage: 8412
  published: 2022-05-03 00:00:00 +0000
- title: ' The Curse of Passive Data Collection in Batch Reinforcement Learning '
  abstract: ' In high stake applications, active experimentation may be considered too risky and thus data are often collected passively. While in simple cases, such as in bandits, passive and active data collection are similarly effective, the price of passive sampling can be much higher when collecting data from a system with controlled states. The main focus of the current paper is the characterization of this price. For example, when learning in episodic finite state-action Markov decision processes (MDPs) with $S$ states and $A$ actions, we show that even with the best (but passively chosen) logging policy, $\Omega(A^{\min(\rS-1, H)}/\varepsilon^2)$ episodes are necessary (and sufficient) to obtain an $\epsilon$-optimal policy, where $H$ is the length of episodes. Note that this shows that the sample complexity blows up exponentially compared to the case of active data collection, a result which is not unexpected, but, as far as we know, have not been published beforehand and perhaps the form of the exact expression is a little surprising. We also extend these results in various directions, such as other criteria or learning in the presence of function approximation, with similar conclusions. A remarkable feature of our result is the sharp characterization of the exponent that appears, which is critical for understanding what makes passive learning hard. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/xiao22b.html
  PDF: https://proceedings.mlr.press/v151/xiao22b/xiao22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-xiao22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Chenjun
    family: Xiao
  - given: Ilbin
    family: Lee
  - given: Bo
    family: Dai
  - given: Dale
    family: Schuurmans
  - given: Csaba
    family: Szepesvari
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8413-8438
  id: xiao22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8413
  lastpage: 8438
  published: 2022-05-03 00:00:00 +0000
- title: ' Faster One-Sample Stochastic Conditional Gradient Method for Composite Convex Minimization '
  abstract: ' We propose a stochastic conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms. Existing CGM variants for this template either suffer from slow convergence rates, or require carefully increasing the batch size over the course of the algorithm’s execution, which leads to computing full gradients. In contrast, the proposed method, equipped with a stochastic average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques. In applications we put special emphasis on problems with a large number of separable constraints. Such problems are prevalent among semidefinite programming (SDP) formulations arising in machine learning and theoretical computer science. We provide numerical experiments on matrix completion, unsupervised clustering, and sparsest-cut SDPs. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/dresdner22a.html
  PDF: https://proceedings.mlr.press/v151/dresdner22a/dresdner22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-dresdner22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Gideon
    family: Dresdner
  - given: Maria-Luiza
    family: Vladarean
  - given: Gunnar
    family: Rätsch
  - given: Francesco
    family: Locatello
  - given: Volkan
    family: Cevher
  - given: Alp
    family: Yurtsever
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8439-8457
  id: dresdner22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8439
  lastpage: 8457
  published: 2022-05-03 00:00:00 +0000
- title: ' Adversarial Tracking Control via Strongly Adaptive Online Learning with Memory '
  abstract: ' We consider the problem of tracking an adversarial state sequence in a linear dynamical system subject to adversarial disturbances and loss functions, generalizing earlier settings in the literature. To this end, we develop three techniques, each of independent interest. First, we propose a comparator-adaptive algorithm for online linear optimization with movement cost. Without tuning, it nearly matches the performance of the optimally tuned gradient descent in hindsight. Next, considering a related problem called online learning with memory, we construct a novel strongly adaptive algorithm that uses our first contribution as a building block. Finally, we present the first reduction from adversarial tracking control to strongly adaptive online learning with memory. Summarizing these individual techniques, we obtain an adversarial tracking controller with a strong performance guarantee even when the reference trajectory has a large range of movement. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/zhang22f.html
  PDF: https://proceedings.mlr.press/v151/zhang22f/zhang22f.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-zhang22f.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Zhiyu
    family: Zhang
  - given: Ashok
    family: Cutkosky
  - given: Ioannis
    family: Paschalidis
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8458-8492
  id: zhang22f
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8458
  lastpage: 8492
  published: 2022-05-03 00:00:00 +0000
- title: ' Look-Ahead Acquisition Functions for Bernoulli Level Set Estimation '
  abstract: ' Level set estimation (LSE) is the problem of identifying regions where an unknown function takes values above or below a specified threshold. Active sampling strategies for efficient LSE have primarily been studied in continuous-valued functions. Motivated by applications in human psychophysics where common experimental designs produce binary responses, we study LSE active sampling with Bernoulli outcomes. With Gaussian process classification surrogate models, the look-ahead model posteriors used by state-of-the-art continuous-output methods are intractable. However, we derive analytic expressions for look-ahead posteriors of sublevel set membership, and show how these lead to analytic expressions for a class of look-ahead LSE acquisition functions, including information-based methods. Benchmark experiments show the importance of considering the global look-ahead impact on the entire posterior. We demonstrate a clear benefit to using this new class of acquisition functions on benchmark problems, and on a challenging real-world task of estimating a high-dimensional contrast sensitivity function. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/letham22a.html
  PDF: https://proceedings.mlr.press/v151/letham22a/letham22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-letham22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Benjamin
    family: Letham
  - given: Phillip
    family: Guan
  - given: Chase
    family: Tymms
  - given: Eytan
    family: Bakshy
  - given: Michael
    family: Shvartsman
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8493-8513
  id: letham22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8493
  lastpage: 8513
  published: 2022-05-03 00:00:00 +0000
- title: ' Debiasing Samples from Online Learning Using Bootstrap '
  abstract: ' It has been recently shown in the literature (Nie et al, 2018; Shin et al, 2019a,b) that the sample averages from online learning experiments are biased when used to estimate the mean reward. To correct the bias, off-policy evaluation methods, including importance sampling and doubly robust estimators, typically calculate the conditional propensity score, which is ill-defined for non-randomized policies such as UCB. This paper provides a procedure to debias the samples using bootstrap, which doesn’t require the knowledge of the reward distribution and can be applied to any adaptive policies. Numerical experiments demonstrate the effective bias reduction for samples generated by popular multi-armed bandit algorithms such as Explore-Then-Commit (ETC), UCB, Thompson sampling (TS) and $\epsilon$-greedy (EG). We analyze and provide theoretical justifications for the procedure under the ETC algorithm, including the asymptotic convergence of the bias decay rate in the real and bootstrap worlds. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/chen22g.html
  PDF: https://proceedings.mlr.press/v151/chen22g/chen22g.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-chen22g.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ningyuan
    family: Chen
  - given: Xuefeng
    family: Gao
  - given: Yi
    family: Xiong
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8514-8533
  id: chen22g
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8514
  lastpage: 8533
  published: 2022-05-03 00:00:00 +0000
- title: ' Convergence of online k-means '
  abstract: ' We prove asymptotic convergence for a general class of k-means algorithms performed over streaming data from a distribution–the centers asymptotically converge to the set of stationary points of the k-means objective function. To do so, we show that online k-means over a distribution can be interpreted as stochastic gradient descent with a stochastic learning rate schedule. Then, we prove convergence by extending techniques used in optimization literature to handle settings where center-specific learning rates may depend on the past trajectory of the centers. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/so22a.html
  PDF: https://proceedings.mlr.press/v151/so22a/so22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-so22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Geelon
    family: So
  - given: Gaurav
    family: Mahajan
  - given: Sanjoy
    family: Dasgupta
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8534-8569
  id: so22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8534
  lastpage: 8569
  published: 2022-05-03 00:00:00 +0000
- title: ' Reconstructing Test Labels from Noisy Loss Functions '
  abstract: ' Machine learning classifiers rely on loss functions for performance evaluation, often on a private (hidden) dataset. In a recent line of research, label inference was introduced as the problem of reconstructing the ground truth labels of this private dataset from just the (possibly perturbed) cross-entropy loss function values evaluated at chosen prediction vectors (without any other access to the hidden dataset). In this paper, we formally study the necessary and sufficient conditions under which label inference is possible from any (noisy) loss function value. Using tools from analytical number theory, we show that a broad class of commonly used loss functions, including general Bregman divergence-based losses and multiclass cross-entropy with common activation functions like sigmoid and softmax, it is possible to design label inference attacks that succeed even for arbitrary noise levels and using only a single query from the adversary. We formally study the computational complexity of label inference and show that while in general, designing adversarial prediction vectors for these attacks is co-NP-hard, once we have these vectors, the attacks can also be carried out through a lightweight augmentation to any neural network model, making them look benign and hard to detect. The observations in this paper provide a deeper understanding of the vulnerabilities inherent in modern machine learning and could be used for designing future trustworthy ML. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/aggarwal22a.html
  PDF: https://proceedings.mlr.press/v151/aggarwal22a/aggarwal22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-aggarwal22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Abhinav
    family: Aggarwal
  - given: Shiva
    family: Kasiviswanathan
  - given: Zekun
    family: Xu
  - given: Oluwaseyi
    family: Feyisetan
  - given: Nathanael
    family: Teissier
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8570-8591
  id: aggarwal22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8570
  lastpage: 8591
  published: 2022-05-03 00:00:00 +0000
- title: ' Contrasting the landscape of contrastive and non-contrastive learning '
  abstract: ' A lot of recent advances in unsupervised feature learning are based on designing features which are invariant under semantic data augmentations. A common way to do this is contrastive learning, which uses positive and negative samples. Some recent works however have shown promising results for non-contrastive learning, which does not require negative samples. However, the non-contrastive losses have obvious “collapsed” minima, in which the encoders output a constant feature embedding, independent of the input. A folk conjecture is that so long as these collapsed solutions are avoided, the produced feature representations should be good. In our paper, we cast doubt on this story: we show through theoretical results and controlled experiments that even on simple data models, non-contrastive losses have a preponderance of non-collapsed bad minima. Moreover, we show that the training process does not avoid these minima. Code for this work can be found at https://github.com/ashwinipokle/contrastive_landscape. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/pokle22a.html
  PDF: https://proceedings.mlr.press/v151/pokle22a/pokle22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-pokle22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ashwini
    family: Pokle
  - given: Jinjin
    family: Tian
  - given: Yuchen
    family: Li
  - given: Andrej
    family: Risteski
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8592-8618
  id: pokle22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8592
  lastpage: 8618
  published: 2022-05-03 00:00:00 +0000
- title: ' A general class of surrogate functions for stable and efficient reinforcement learning '
  abstract: ' Common policy gradient methods rely on the maximization of a sequence of surrogate functions. In recent years, many such surrogate functions have been proposed, most without strong theoretical guarantees, leading to algorithms such as TRPO, PPO, or MPO. Rather than design yet another surrogate function, we instead propose a general framework (FMA-PG) based on functional mirror ascent that gives rise to an entire family of surrogate functions. We construct surrogate functions that enable policy improvement guarantees, a property not shared by most existing surrogate functions. Crucially, these guarantees hold regardless of the choice of policy parameterization. Moreover, a particular instantiation of FMA-PG recovers important implementation heuristics (e.g., using forward vs reverse KL divergence) resulting in a variant of TRPO with additional desirable properties. Via experiments on simple reinforcement learning problems, we evaluate the algorithms instantiated by FMA-PG. The proposed framework also suggests an improved variant of PPO, whose robustness and efficiency we empirically demonstrate on the MuJoCo suite. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/vaswani22a.html
  PDF: https://proceedings.mlr.press/v151/vaswani22a/vaswani22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-vaswani22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Sharan
    family: Vaswani
  - given: Olivier
    family: Bachem
  - given: Simone
    family: Totaro
  - given: Robert
    family: Müller
  - given: Shivam
    family: Garg
  - given: Matthieu
    family: Geist
  - given: Marlos C.
    family: Machado
  - given: Pablo
    family: Samuel Castro
  - given: Nicolas
    family: Le Roux
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8619-8649
  id: vaswani22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8619
  lastpage: 8649
  published: 2022-05-03 00:00:00 +0000
- title: ' Fundamental limits for rank-one matrix estimation with groupwise heteroskedasticity '
  abstract: ' Low-rank matrix recovery problems involving high-dimensional and heterogeneous data appear in applications throughout statistics and machine learning. The contribution of this paper is to establish the fundamental limits of recovery for a broad class of these problems. In particular, we study the problem of estimating a rank-one matrix from Gaussian observations where different blocks of the matrix are observed under different noise levels. In the setting where the number of blocks is fixed while the number of variables tends to infinity, we prove asymptotically exact formulas for the minimum mean-squared error in estimating both the matrix and underlying factors. These results are based on a novel reduction from the low-rank matrix tensor product model (with homogeneous noise) to a rank-one model with heteroskedastic noise. As an application of our main result, we show that show recently proposed methods based on applying principal component analysis (PCA) to weighted combinations of the data are optimal in some settings but sub-optimal in others. We also provide numerical results comparing our asymptotic formulas with the performance of methods based weighted PCA, gradient descent, and approximate message passing. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/behne22a.html
  PDF: https://proceedings.mlr.press/v151/behne22a/behne22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-behne22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Joshua K.
    family: Behne
  - given: Galen
    family: Reeves
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8650-8672
  id: behne22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8650
  lastpage: 8672
  published: 2022-05-03 00:00:00 +0000
- title: ' Characterizing and Understanding the Generalization Error of Transfer Learning with Gibbs Algorithm '
  abstract: ' We provide an information-theoretic analysis of the generalization ability of Gibbs-based transfer learning algorithms by focusing on two popular empirical risk minimization (ERM) approaches for transfer learning, $\alpha$-weighted-ERM and two-stage-ERM. Our key result is an exact characterization of the generalization behavior using the conditional symmetrized Kullback-Leibler (KL) information between the output hypothesis and the target training samples given the source training samples. Our results can also be applied to provide novel distribution-free generalization error upper bounds on these two aforementioned Gibbs algorithms. Our approach is versatile, as it also characterizes the generalization errors and excess risks of these two Gibbs algorithms in the asymptotic regime, where they converge to the $\alpha$-weighted-ERM and two-stage-ERM, respectively. Based on our theoretical results, we show that the benefits of transfer learning can be viewed as a bias-variance trade-off, with the bias induced by the source distribution and the variance induced by the lack of target samples. We believe this viewpoint can guide the choice of transfer learning algorithms in practice. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/bu22a.html
  PDF: https://proceedings.mlr.press/v151/bu22a/bu22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-bu22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yuheng
    family: Bu
  - given: Gholamali
    family: Aminian
  - given: Laura
    family: Toni
  - given: Gregory W.
    family: Wornell
  - given: Miguel
    family: Rodrigues
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8673-8699
  id: bu22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8673
  lastpage: 8699
  published: 2022-05-03 00:00:00 +0000
- title: ' Decoupling Local and Global Representations of Time Series '
  abstract: ' Real-world time series data are often generated from several sources of variation. Learning representations that capture the factors contributing to this variability enables better understanding of the data via its underlying generative process and can lead to improvements in performance on downstream machine learning tasks. In this paper, we propose a novel generative approach for learning representations for the global and local factors of variation in time series data. The local representation of each sample models non-stationarity over time with a stochastic process prior, and the global representation of the sample encodes the time-independent characteristics. To encourage decoupling between the representations, we introduce a counterfactual regularization that minimizes the mutual information between the two variables. In experiments, we demonstrate successful recovery of the true local and global factors of variability on simulated data, and show that representations learned using our method lead to superior performance on downstream tasks on real-world datasets. We believe that the proposed way of defining representations is beneficial for data modelling and can yield better insights into the complexity of the real-world data. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/tonekaboni22a.html
  PDF: https://proceedings.mlr.press/v151/tonekaboni22a/tonekaboni22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-tonekaboni22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Sana
    family: Tonekaboni
  - given: Chun-Liang
    family: Li
  - given: Sercan O.
    family: Arik
  - given: Anna
    family: Goldenberg
  - given: Tomas
    family: Pfister
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8700-8714
  id: tonekaboni22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8700
  lastpage: 8714
  published: 2022-05-03 00:00:00 +0000
- title: ' Regret Bounds for Expected Improvement Algorithms in Gaussian Process Bandit Optimization '
  abstract: ' The expected improvement (EI) algorithm is one of the most popular strategies for optimization under uncertainty due to its simplicity and efficiency. Despite its popularity, the theoretical aspects of this algorithm have not been properly analyzed. In particular, whether in the noisy setting, the EI strategy with a standard incumbent converges is still an open question of the Gaussian process bandit optimization problem. We aim to answer this question by proposing a variant of EI with a standard incumbent defined via the GP predictive mean. We prove that our algorithm converges, and achieves a cumulative regret bound of $\mathcal O(\gamma_T\sqrt{T})$, where $\gamma_T$ is the maximum information gain between $T$ observations and the Gaussian process model. Based on this variant of EI, we further propose an algorithm called Improved GP-EI that converges faster than previous counterparts. In particular, our proposed variants of EI do not require the knowledge of the RKHS norm and the noise’s sub-Gaussianity parameter as in previous works. Empirical validation in our paper demonstrates the effectiveness of our algorithms compared to several baselines. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/tran-the22a.html
  PDF: https://proceedings.mlr.press/v151/tran-the22a/tran-the22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-tran-the22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Hung
    family: Tran-The
  - given: Sunil
    family: Gupta
  - given: Santu
    family: Rana
  - given: Svetha
    family: Venkatesh
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8715-8737
  id: tran-the22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8715
  lastpage: 8737
  published: 2022-05-03 00:00:00 +0000
- title: ' Optimal estimation of Gaussian DAG models '
  abstract: ' We study the optimal sample complexity of learning a Gaussian directed acyclic graph (DAG) from observational data. Our main results establish the minimax optimal sample complexity for learning the structure of a linear Gaussian DAG model in two settings of interest: 1) Under equal variances without knowledge of the true ordering, and 2) For general linear models given knowledge of the ordering. In both cases the sample complexity is $n\asymp q\log(d/q)$, where $q$ is the maximum number of parents and $d$ is the number of nodes. We further make comparisons with the classical problem of learning (undirected) Gaussian graphical models, showing that under the equal variance assumption, these two problems share the same optimal sample complexity. In other words, at least for Gaussian models with equal error variances, learning a directed graphical model is statistically no more difficult than learning an undirected graphical model. Our results also extend to more general identification assumptions as well as subgaussian errors. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/gao22a.html
  PDF: https://proceedings.mlr.press/v151/gao22a/gao22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-gao22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ming
    family: Gao
  - given: Wai
    family: Ming Tai
  - given: Bryon
    family: Aragam
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8738-8757
  id: gao22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8738
  lastpage: 8757
  published: 2022-05-03 00:00:00 +0000
- title: ' Improved Approximation Algorithms for Individually Fair Clustering '
  abstract: ' We consider the $k$-clustering problem with $\ell_p$-norm cost, which includes $k$-median, $k$-means and $k$-center, under an individual notion of fairness proposed by Jung et al. [2020]: given a set of points $P$ of size $n$, a set of $k$ centers induces a fair clustering if every point in $P$ has a center among its $n/k$ closest neighbors. Mahabadi and Vakilian [2020] presented a $( p^{O(p)},7)$-bicriteria approximation for fair clustering with $\ell_p$-norm cost: every point finds a center within distance at most $7$ times its distance to its $(n/k)$-th closest neighbor and the $\ell_p$-norm cost of the solution is at most $p^{O(p)}$ times the cost of an optimal fair solution. In this work, for any $\epsilon>0$, we present an improved $(16^p +\epsilon,3)$-bicriteria for this problem. Moreover, for $p=1$ ($k$-median) and $p=\infty$ ($k$-center), we present improved cost-approximation factors $7.081+\epsilon$ and $3+\epsilon$ respectively. To achieve our guarantees, we extend the framework of [Charikar et al.,2002, Swamy, 2016] and devise a $16^p$-approximation algorithm for the facility location with $\ell_p$-norm cost under matroid constraint which might be of an independent interest. Besides, our approach suggests a reduction from our individually fair clustering to a clustering with a group fairness requirement proposed by [Kleindessner et al. 2019], which is essentially the median matroid problem. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/vakilian22a.html
  PDF: https://proceedings.mlr.press/v151/vakilian22a/vakilian22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-vakilian22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ali
    family: Vakilian
  - given: Mustafa
    family: Yalciner
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8758-8779
  id: vakilian22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8758
  lastpage: 8779
  published: 2022-05-03 00:00:00 +0000
- title: ' Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for Machine Learning '
  abstract: ' Data Shapley has recently been proposed as a principled framework to quantify the contribution of individual datum in machine learning. It can effectively identify helpful or harmful data points for a learning algorithm. In this paper, we propose Beta Shapley, which is a substantial generalization of Data Shapley. Beta Shapley arises naturally by relaxing the efficiency axiom of the Shapley value, which is not critical for machine learning settings. Beta Shapley unifies several popular data valuation methods and includes data Shapley as a special case. Moreover, we prove that Beta Shapley has several desirable statistical properties and propose efficient algorithms to estimate it. We demonstrate that Beta Shapley outperforms state-of-the-art data valuation methods on several downstream ML tasks such as: 1) detecting mislabeled training data; 2) learning with subsamples; and 3) identifying points whose addition or removal have the largest positive or negative impact on the model. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/kwon22a.html
  PDF: https://proceedings.mlr.press/v151/kwon22a/kwon22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-kwon22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yongchan
    family: Kwon
  - given: James
    family: Zou
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8780-8802
  id: kwon22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8780
  lastpage: 8802
  published: 2022-05-03 00:00:00 +0000
- title: ' Variance Minimization in the Wasserstein Space for Invariant Causal Prediction '
  abstract: ' Selecting powerful predictors for an outcome is a cornerstone task for machine learning. However, some types of questions can only be answered by identifying the predictors that causally affect the outcome. A recent approach to this causal inference problem leverages the invariance property of a causal mechanism across differing experimental environments (Peters et al., 2016; Heinze-Deml et al., 2018). This method, invariant causal prediction (ICP), has a substantial computational defect – the runtime scales exponentially with the number of possible causal variables. In this work, we show that the approach taken in ICP may be reformulated as a series of nonparametric tests that scales linearly in the number of predictors. Each of these tests relies on the minimization of a novel loss function – the Wasserstein variance – that is derived from tools in optimal transport theory and is used to quantify distributional variability across environments. We prove under mild assumptions that our method is able to recover the set of identifiable direct causes, and we demonstrate in our experiments that it is competitive with other benchmark causal discovery algorithms. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/martinet22a.html
  PDF: https://proceedings.mlr.press/v151/martinet22a/martinet22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-martinet22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Guillaume G.
    family: Martinet
  - given: Alexander
    family: Strzalkowski
  - given: Barbara
    family: Engelhardt
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8803-8851
  id: martinet22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8803
  lastpage: 8851
  published: 2022-05-03 00:00:00 +0000
- title: ' Online Continual Adaptation with Active Self-Training '
  abstract: ' Models trained with offline data often suffer from continual distribution shifts and expensive labeling in changing environments. This calls for a new online learning paradigm where the learner can continually adapt to changing environments with limited labels. In this paper, we propose a new online setting – Online Active Continual Adaptation, where the learner aims to continually adapt to changing distributions using both unlabeled samples and active queries of limited labels. To this end, we propose Online Self-Adaptive Mirror Descent (OSAMD), which adopts an online teacher-student structure to enable online self-training from unlabeled data, and a margin-based criterion that decides whether to query the labels to track changing distributions. Theoretically, we show that, in the separable case, OSAMD has an $O({T}^{2/3})$ dynamic regret bound under mild assumptions, which is aligned with the $\Omega(T^{2/3})$ lower bound of online learning algorithms with full labels. In the general case, we show a regret bound of $O({T}^{2/3} + \alpha^* T)$, where $\alpha^*$ denotes the separability of domains and is usually small. Our theoretical results show that OSAMD can fast adapt to changing environments with active queries. Empirically, we demonstrate that OSAMD achieves favorable regrets under changing environments with limited labels on both simulated and real-world data, which corroborates our theoretical findings. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/zhou22d.html
  PDF: https://proceedings.mlr.press/v151/zhou22d/zhou22d.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-zhou22d.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Shiji
    family: Zhou
  - given: Han
    family: Zhao
  - given: Shanghang
    family: Zhang
  - given: Lianzhe
    family: Wang
  - given: Heng
    family: Chang
  - given: Zhi
    family: Wang
  - given: Wenwu
    family: Zhu
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8852-8883
  id: zhou22d
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8852
  lastpage: 8883
  published: 2022-05-03 00:00:00 +0000
- title: ' Structured variational inference in Bayesian state-space models '
  abstract: ' Variational inference is routinely deployed in Bayesian state-space models as an efficient computational technique. Motivated by the inconsistency issue observed by Wang and Titterington (2004) for the mean-field approximation in linear state-space models, we consider a more expressive variational family for approximating the joint posterior of the latent variables to retain their dependence, while maintaining the mean-field (i.e. independence) structure between latent variables and parameters. In state-space models, such a latent structure adapted mean-field approximation can be efficiently computed using the belief propagation algorithm. Theoretically, we show that this adapted mean-field approximation achieves consistency of the variational estimates. Furthermore, we derive a non-asymptotic risk bound for an averaged alpha-divergence from the true data generating model, suggesting that the posterior mean of the best variational approximation for the static parameters shows optimal concentration. From a broader perspective, we add to the growing literature on statistical accuracy of variational approximations by allowing dependence between the latent variables, and the techniques developed here should be useful in related contexts. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/wang22g.html
  PDF: https://proceedings.mlr.press/v151/wang22g/wang22g.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-wang22g.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Honggang
    family: Wang
  - given: Anirban
    family: Bhattacharya
  - given: Debdeep
    family: Pati
  - given: Yun
    family: Yang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8884-8905
  id: wang22g
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8884
  lastpage: 8905
  published: 2022-05-03 00:00:00 +0000
- title: ' Structured Multi-task Learning for Molecular Property Prediction '
  abstract: ' Multi-task learning for molecular property prediction is becoming increasingly important in drug discovery. However, in contrast to other domains, the performance of multi-task learning in drug discovery is still not satisfying as the number of labeled data for each task is too limited, which calls for additional data to complement the data scarcity. In this paper, we study multi-task learning for molecular property prediction in a novel setting, where a relation graph between tasks is available. We first construct a dataset including around 400 tasks as well as a task relation graph. Then to better utilize such relation graph, we propose a method called SGNN-EBM to systematically investigate the structured task modeling from two perspectives. (1) In the latent space, we model the task representations by applying a state graph neural network (SGNN) on the relation graph. (2) In the output space, we employ structured prediction with the energy-based model (EBM), which can be efficiently trained through noise-contrastive estimation (NCE) approach. Empirical results justify the effectiveness of SGNN-EBM. Code is available on https://github.com/chao1224/SGNN-EBM. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/liu22e.html
  PDF: https://proceedings.mlr.press/v151/liu22e/liu22e.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-liu22e.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Shengchao
    family: Liu
  - given: Meng
    family: Qu
  - given: Zuobai
    family: Zhang
  - given: Huiyu
    family: Cai
  - given: Jian
    family: Tang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8906-8920
  id: liu22e
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8906
  lastpage: 8920
  published: 2022-05-03 00:00:00 +0000
- title: ' Nonparametric Relational Models with Superrectangulation '
  abstract: ' This paper addresses the question, ”What is the smallest object that contains all rectangular partitions with n or fewer blocks?” and shows its application to relational data analysis using a new strategy we call super Bayes as an alternative to Bayesian nonparametric (BNP) methods. Conventionally, standard BNP methods have combined the Aldous-Hoover-Kallenberg representation with parsimonious stochastic processes on rectangular partitioning to construct BNP relational models. As a result, conventional methods face the great difficulty of searching for a parsimonious random rectangular partition that fits the observed data well in Bayesian inference. As a way to essentially avoid such a problem, we propose a strategy to combine an extremely redundant rectangular partition as a deterministic (non-probabilistic) object. Specifically, we introduce a special kind of rectangular partitioning, which we call superrectangulation, that contains all possible rectangular partitions. Delightfully, this strategy completely eliminates the difficult task of searching around for random rectangular partitions, since the superrectangulation is deterministically fixed in inference. Experiments on predictive performance in relational data analysis show that the super Bayesian model provides a more stable analysis than the existing BNP models, which are less likely to be trapped in bad local optima. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/nakano22a.html
  PDF: https://proceedings.mlr.press/v151/nakano22a/nakano22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-nakano22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Masahiro
    family: Nakano
  - given: Ryo
    family: Nishikimi
  - given: Yasuhiro
    family: Fujiwara
  - given: Akisato
    family: Kimura
  - given: Takeshi
    family: Yamada
  - given: Naonori
    family: Ueda
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8921-8937
  id: nakano22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8921
  lastpage: 8937
  published: 2022-05-03 00:00:00 +0000
- title: ' Regret, stability & fairness in matching markets with bandit learners '
  abstract: ' Making an informed decision—for example, when choosing a career or housing—requires knowledge about the available options. Such knowledge is generally acquired through costly trial and error, but this learning process can be disrupted by competition. In this work, we study how competition affects the long-term outcomes of individuals as they learn. We build on a line of work that models this setting as a two-sided matching market with bandit learners. A recent result in this area states that it is impossible to simultaneously guarantee two natural desiderata: stability and low optimal regret for all agents. Resource-allocating platforms can point to this result as a justification for assigning good long-term outcomes to some agents and poor ones to others. We show that this impossibility need not hold true. In particular, by modeling two additional components of competition—namely, costs and transfers—we prove that it is possible to simultaneously guarantee four desiderata: stability, low optimal regret, fairness in the distribution of regret, and high social welfare. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/cen22a.html
  PDF: https://proceedings.mlr.press/v151/cen22a/cen22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-cen22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Sarah H.
    family: Cen
  - given: Devavrat
    family: Shah
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8938-8968
  id: cen22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8938
  lastpage: 8968
  published: 2022-05-03 00:00:00 +0000
- title: ' Probing GNN Explainers: A Rigorous Theoretical and Empirical Analysis of GNN Explanation Methods '
  abstract: ' As Graph Neural Networks (GNNs) are increasingly being employed in critical real-world applications, several methods have been proposed in recent literature to explain the predictions of these models. However, there has been little to no work on systematically analyzing the reliability of these methods. Here, we introduce the first-ever theoretical analysis of the reliability of state-of-the-art GNN explanation methods. More specifically, we theoretically analyze the behavior of various state-of-the-art GNN explanation methods with respect to several desirable properties (e.g., faithfulness, stability, and fairness preservation) and establish upper bounds on the violation of these properties. We also empirically validate our theoretical results using extensive experimentation with nine real-world graph datasets. Our empirical results further shed light on several interesting insights about the behavior of state-of-the-art GNN explanation methods. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/agarwal22b.html
  PDF: https://proceedings.mlr.press/v151/agarwal22b/agarwal22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-agarwal22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Chirag
    family: Agarwal
  - given: Marinka
    family: Zitnik
  - given: Himabindu
    family: Lakkaraju
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8969-8996
  id: agarwal22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8969
  lastpage: 8996
  published: 2022-05-03 00:00:00 +0000
- title: ' Distributionally Robust Structure Learning for Discrete Pairwise Markov Networks '
  abstract: ' We consider the problem of learning the underlying structure of a general discrete pairwise Markov network. Existing approaches that rely on empirical risk minimization may perform poorly in settings with noisy or scarce data. To overcome these limitations, we propose a computationally efficient and robust learning method for this problem with near-optimal sample complexities. Our approach builds upon distributionally robust optimization (DRO) and maximum conditional log-likelihood. The proposed DRO estimator minimizes the worst-case risk over an ambiguity set of adversarial distributions within bounded transport cost or f-divergence of the empirical data distribution. We show that the primal minimax learning problem can be efficiently solved by leveraging sufficient statistics and greedy maximization in the ostensibly intractable dual formulation. Based on DRO’s approximation to Lipschitz and variance regularization, we derive near-optimal sample complexities matching existing results. Extensive empirical evidence with different corruption models corroborates the effectiveness of the proposed methods. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/li22f.html
  PDF: https://proceedings.mlr.press/v151/li22f/li22f.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-li22f.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yeshu
    family: Li
  - given: Zhan
    family: Shi
  - given: Xinhua
    family: Zhang
  - given: Brian
    family: Ziebart
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 8997-9016
  id: li22f
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 8997
  lastpage: 9016
  published: 2022-05-03 00:00:00 +0000
- title: ' ExactBoost: Directly Boosting the Margin in Combinatorial and Non-decomposable Metrics '
  abstract: ' Many classification algorithms require the use of surrogate losses when the intended loss function is combinatorial or non-decomposable. This paper introduces a fast and exact stagewise optimization algorithm, dubbed ExactBoost, that boosts stumps to the actual loss function. By developing a novel extension of margin theory to the non-decomposable setting, it is possible to provably bound the generalization error of ExactBoost for many important metrics with different levels of non-decomposability. Through extensive examples, it is shown that such theoretical guarantees translate to competitive empirical performance. In particular, when used as an ensembler, ExactBoost is able to significantly outperform other surrogate-based and exact algorithms available. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/csillag22a.html
  PDF: https://proceedings.mlr.press/v151/csillag22a/csillag22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-csillag22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Daniel
    family: Csillag
  - given: Carolina
    family: Piazza
  - given: Thiago
    family: Ramos
  - given: João
    family: Vitor Romano
  - given: Roberto I.
    family: Oliveira
  - given: Paulo
    family: Orenstein
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9017-9049
  id: csillag22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9017
  lastpage: 9049
  published: 2022-05-03 00:00:00 +0000
- title: ' Sharp Bounds for Federated Averaging (Local SGD) and Continuous Perspective '
  abstract: ' Federated Averaging (FedAvg), also known as Local SGD, is one of the most popular algorithms in Federated Learning (FL). Despite its simplicity and popularity, the convergence rate of FedAvg has thus far been undetermined. Even under the simplest assumptions (convex, smooth, homogeneous, and bounded covariance), the best-known upper and lower bounds do not match, and it is not clear whether the existing analysis captures the capacity of the algorithm. In this work, we first resolve this question by providing a lower bound for FedAvg that matches the existing upper bound, which shows the existing FedAvg upper bound analysis is not improvable. Additionally, we establish a lower bound in a heterogeneous setting that nearly matches the existing upper bound. While our lower bounds show the limitations of FedAvg, under an additional assumption of third-order smoothness, we prove more optimistic state-of-the-art convergence results in both convex and non-convex settings. Our analysis stems from a notion we call iterate bias, which is defined by the deviation of the expectation of the SGD trajectory from the noiseless gradient descent trajectory with the same initialization. We prove novel sharp bounds on this quantity, and show intuitively how to analyze this quantity from a Stochastic Differential Equation (SDE) perspective. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/glasgow22a.html
  PDF: https://proceedings.mlr.press/v151/glasgow22a/glasgow22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-glasgow22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Margalit R.
    family: Glasgow
  - given: Honglin
    family: Yuan
  - given: Tengyu
    family: Ma
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9050-9090
  id: glasgow22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9050
  lastpage: 9090
  published: 2022-05-03 00:00:00 +0000
- title: ' Near-Optimal Task Selection for Meta-Learning with Mutual Information and Online Variational Bayesian Unlearning '
  abstract: ' This paper addresses the problem of active task selection which involves selecting the most informative tasks for meta-learning. We propose a novel active task selection criterion based on the mutual information between latent task vectors. Unfortunately, such a criterion scales poorly in the number of candidate tasks when optimized. To resolve this issue, we exploit the submodularity property of our new criterion for devising the first active task selection algorithm for meta-learning with a near-optimal performance guarantee. To further improve our efficiency, we propose an online variant of the Stein variational gradient descent to perform fast belief updates of the meta-parameters via maintaining a set of forward (and backward) particles when learning (or unlearning) from each selected task. We empirically demonstrate the performance of our proposed algorithm on real-world datasets. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/chen22h.html
  PDF: https://proceedings.mlr.press/v151/chen22h/chen22h.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-chen22h.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yizhou
    family: Chen
  - given: Shizhuo
    family: Zhang
  - given: Bryan
    family: Kian Hsiang Low
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9091-9113
  id: chen22h
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9091
  lastpage: 9113
  published: 2022-05-03 00:00:00 +0000
- title: ' Best Arm Identification with Safety Constraints '
  abstract: ' The best arm identification problem in the multi-armed bandit setting is an excellent model of many real-world decision-making problems, yet it fails to capture the fact that in the real-world, safety constraints often must be met while learning. In this work we study the question of best-arm identification in safety-critical settings, where the goal of the agent is to find the best safe option out of many, while exploring in a way that guarantees certain, initially unknown safety constraints are met. We first analyze this problem in the setting where the reward and safety constraint takes a linear structure, and show nearly matching upper and lower bounds. We then analyze a much more general version of the problem where we only assume the reward and safety constraint can be modeled by monotonic functions, and propose an algorithm in this setting which is guaranteed to learn safely. We conclude with experimental results demonstrating the effectiveness of our approaches in scenarios such as safely identifying the best drug out of many in order to treat an illness. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/wang22h.html
  PDF: https://proceedings.mlr.press/v151/wang22h/wang22h.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-wang22h.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Zhenlin
    family: Wang
  - given: Andrew J.
    family: Wagenmaker
  - given: Kevin
    family: Jamieson
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9114-9146
  id: wang22h
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9114
  lastpage: 9146
  published: 2022-05-03 00:00:00 +0000
- title: ' On the complexity of the optimal transport problem with graph-structured cost '
  abstract: ' Multi-marginal optimal transport (MOT) is a generalization of optimal transport to multiple marginals. Optimal transport has evolved into an important tool in many machine learning applications, and its multi-marginal extension opens up for addressing new challenges in the field of machine learning. However, the usage of MOT has been largely impeded by its computational complexity which scales exponentially in the number of marginals. Fortunately, in many applications, such as barycenter or interpolation problems, the cost function adheres to structures, which has recently been exploited for developing efficient computational methods. In this work we derive computational bounds for these methods. In particular, with $m$ marginal distributions supported on $n$ points, we provide a $ \mathcal{\tilde O}(d(\mathcal{T})m n^{w(G)+1}\epsilon^{-2})$ bound for a $\epsilon$-accuracy when the problem is associated with a graph that can be factored as a junction tree with diameter $d(\mathcal{T})$ and tree-width $w(G)$. For the special case of the Wasserstein barycenter problem, which corresponds to a star-shaped tree, our bound is in alignment with the existing complexity bound for it. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/fan22a.html
  PDF: https://proceedings.mlr.press/v151/fan22a/fan22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-fan22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jiaojiao
    family: Fan
  - given: Isabel
    family: Haasler
  - given: Johan
    family: Karlsson
  - given: Yongxin
    family: Chen
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9147-9165
  id: fan22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9147
  lastpage: 9165
  published: 2022-05-03 00:00:00 +0000
- title: ' Second-Order Sensitivity Analysis for Bilevel Optimization '
  abstract: ' In this work we derive a second-order approach to bilevel optimization, a type of mathematical programming in which the solution to a parameterized optimization problem (the “lower” problem) is itself to be optimized (in the “upper” problem) as a function of the parameters. Many existing approaches to bilevel optimization employ first-order sensitivity analysis, based on the implicit function theorem (IFT), for the lower problem to derive a gradient of the lower problem solution with respect to its parameters; this IFT gradient is then used in a first-order optimization method for the upper problem. This paper extends this sensitivity analysis to provide second-order derivative information of the lower problem (which we call the IFT Hessian), enabling the usage of faster-converging second-order optimization methods at the upper level. Our analysis shows that (i) much of the computation already used to produce the IFT gradient can be reused for the IFT Hessian, (ii) errors bounds derived for the IFT gradient readily apply to the IFT Hessian, (iii) computing IFT Hessians can significantly reduce overall computation by extracting more information from each lower level solve. We corroborate our findings and demonstrate the broad range of applications of our method by applying it to problem instances of least squares hyperparameter auto-tuning, multi-class SVM auto-tuning, and inverse optimal control. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/dyro22a.html
  PDF: https://proceedings.mlr.press/v151/dyro22a/dyro22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-dyro22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Robert
    family: Dyro
  - given: Edward
    family: Schmerling
  - given: Nikos
    family: Arechiga
  - given: Marco
    family: Pavone
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9166-9181
  id: dyro22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9166
  lastpage: 9181
  published: 2022-05-03 00:00:00 +0000
- title: ' On Learning Mixture Models with Sparse Parameters '
  abstract: ' Mixture models are widely used to fit complex and multimodal datasets. In this paper we study mixtures with high dimensional sparse latent parameter vectors and consider the problem of support recovery of those vectors. While parameter learning in mixture models is well-studied, the sparsity constraint remains relatively unexplored. Sparsity of parameter vectors is a natural constraint in variety of settings, and support recovery is a major step towards parameter estimation. We provide efficient algorithms for support recovery that have a logarithmic sample complexity dependence on the dimensionality of the latent space. Our algorithms are quite general, namely they are applicable to 1) mixtures of many different canonical distributions including Uniform, Poisson, Laplace, Gaussians, etc. 2) Mixtures of linear regressions and linear classifiers with Gaussian covariates under different assumptions on the unknown parameters. In most of these settings, our results are the first guarantees on this problem while in the rest, we provide significant improvements on existing results in certain regimes. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/pal22a.html
  PDF: https://proceedings.mlr.press/v151/pal22a/pal22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-pal22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Soumyabrata
    family: Pal
  - given: Arya
    family: Mazumdar
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9182-9213
  id: pal22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9182
  lastpage: 9213
  published: 2022-05-03 00:00:00 +0000
- title: ' Sketch-and-lift: scalable subsampled semidefinite program for K-means clustering '
  abstract: ' Semidefinite programming (SDP) is a powerful tool for tackling a wide range of computationally hard problems such as clustering. Despite the high accuracy, semidefinite programs are often too slow in practice with poor scalability on large (or even moderate) datasets. In this paper, we introduce a linear time complexity algorithm for approximating an SDP relaxed K-means clustering. The proposed sketch-and-lift (SL) approach solves an SDP on a subsampled dataset and then propagates the solution to all data points by a nearest-centroid rounding procedure. It is shown that the SL approach enjoys a similar exact recovery threshold as the K-means SDP on the full dataset, which is known to be information-theoretically tight under the Gaussian mixture model. The SL method can be made adaptive with enhanced theoretic properties when the cluster sizes are unbalanced. Our simulation experiments demonstrate that the statistical accuracy of the proposed method outperforms state-of-the-art fast clustering algorithms without sacrificing too much computational efficiency, and is comparable to the original K-means SDP with substantially reduced runtime. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/zhuang22a.html
  PDF: https://proceedings.mlr.press/v151/zhuang22a/zhuang22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-zhuang22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yubo
    family: Zhuang
  - given: Xiaohui
    family: Chen
  - given: Yun
    family: Yang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9214-9246
  id: zhuang22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9214
  lastpage: 9246
  published: 2022-05-03 00:00:00 +0000
- title: ' Asynchronous Distributed Optimization with Stochastic Delays '
  abstract: ' We study asynchronous finite sum minimization in a distributed-data setting with a central parameter server. While asynchrony is well understood in parallel settings where the data is accessible by all machines—e.g., modifications of variance-reduced gradient algorithms like SAGA work well—little is known for the distributed-data setting. We develop an algorithm ADSAGA based on SAGA for the distributed-data setting, in which the data is partitioned between many machines. We show that with $m$ machines, under a natural stochastic delay model with an mean delay of $m$, ADSAGA converges in $\tilde{O}\left(\left(n + \sqrt{m}\kappa\right)\log(1/\epsilon)\right)$ iterations, where $n$ is the number of component functions, and $\kappa$ is a condition number. This complexity sits squarely between the complexity $\tilde{O}\left(\left(n + \kappa\right)\log(1/\epsilon)\right)$ of SAGA without delays and the complexity $\tilde{O}\left(\left(n + m\kappa\right)\log(1/\epsilon)\right)$ of parallel asynchronous algorithms where the delays are arbitrary (but bounded by $O(m)$), and the data is accessible by all. Existing asynchronous algorithms with distributed-data setting and arbitrary delays have only been shown to converge in $\tilde{O}(n^2\kappa\log(1/\epsilon))$ iterations. We empirically compare on least-squares problems the iteration complexity and wallclock performance of ADSAGA to existing parallel and distributed algorithms, including synchronous minibatch algorithms. Our results demonstrate the wallclock advantage of variance-reduced asynchronous approaches over SGD or synchronous approaches. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/glasgow22b.html
  PDF: https://proceedings.mlr.press/v151/glasgow22b/glasgow22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-glasgow22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Margalit R.
    family: Glasgow
  - given: Mary
    family: Wootters
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9247-9279
  id: glasgow22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9247
  lastpage: 9279
  published: 2022-05-03 00:00:00 +0000
- title: ' Calibration Error for Heterogeneous Treatment Effects '
  abstract: ' Recently, many researchers have advanced data-driven methods for modeling heterogeneous treatment effects (HTEs). Even still, estimation of HTEs is a difficult task–these methods frequently over- or under-estimate the treatment effects, leading to poor calibration of the resulting models. However, while many methods exist for evaluating the calibration of prediction and classification models, formal approaches to assess the calibration of HTE models are limited to the calibration slope. In this paper, we define an analogue of the (L2) expected calibration error for HTEs, and propose a robust estimator. Our approach is motivated by doubly robust treatment effect estimators, making it unbiased, and resilient to confounding, overfitting, and high-dimensionality issues. Furthermore, our method is straightforward to adapt to many structures under which treatment effects can be identified, including randomized trials, observational studies, and survival analysis. We illustrate how to use our proposed metric to evaluate the calibration of learned HTE models through the application to the CRITEO-UPLIFT Trial. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/xu22c.html
  PDF: https://proceedings.mlr.press/v151/xu22c/xu22c.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-xu22c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yizhe
    family: Xu
  - given: Steve
    family: Yadlowsky
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9280-9303
  id: xu22c
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9280
  lastpage: 9303
  published: 2022-05-03 00:00:00 +0000
- title: ' Fast Sparse Classification for Generalized Linear and Additive Models '
  abstract: ' We present fast classification techniques for sparse generalized linear and additive models. These techniques can handle thousands of features and thousands of observations in minutes, even in the presence of many highly correlated features. For fast sparse logistic regression, our computational speed-up over other best-subset search techniques owes to linear and quadratic surrogate cuts for the logistic loss that allow us to efficiently screen features for elimination, as well as use of a priority queue that favors a more uniform exploration of features. As an alternative to the logistic loss, we propose the exponential loss, which permits an analytical solution to the line search at each iteration. Our algorithms are generally 2 to 5 times faster than previous approaches. They produce interpretable models that have accuracy comparable to black box models on challenging datasets. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/liu22f.html
  PDF: https://proceedings.mlr.press/v151/liu22f/liu22f.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-liu22f.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jiachang
    family: Liu
  - given: Chudi
    family: Zhong
  - given: Margo
    family: Seltzer
  - given: Cynthia
    family: Rudin
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9304-9333
  id: liu22f
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9304
  lastpage: 9333
  published: 2022-05-03 00:00:00 +0000
- title: ' A Unified View of SDP-based Neural Network Verification through Completely Positive Programming '
  abstract: ' Verifying that input-output relationships of a neural network conform to prescribed operational specifications is a key enabler towards deploying these networks in safety-critical applications. Semidefinite programming (SDP)-based approaches to Rectified Linear Unit (ReLU) network verification transcribe this problem into an optimization problem, where the accuracy of any such formulation reflects the level of fidelity in how the neural network computation is represented, as well as the relaxations of intractable constraints. While the literature contains much progress on improving the tightness of SDP formulations while maintaining tractability, comparatively little work has been devoted to the other extreme, i.e., how to most accurately capture the original verification problem before SDP relaxation. In this work, we develop an exact, convex formulation of verification as a completely positive program (CPP), and provide analysis showing that our formulation is minimal–the removal of any constraint fundamentally misrepresents the neural network computation. We leverage our formulation to provide a unifying view of existing approaches, and give insight into the source of large relaxation gaps observed in some cases. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/brown22b.html
  PDF: https://proceedings.mlr.press/v151/brown22b/brown22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-brown22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Robin A.
    family: Brown
  - given: Edward
    family: Schmerling
  - given: Navid
    family: Azizan
  - given: Marco
    family: Pavone
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9334-9355
  id: brown22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9334
  lastpage: 9355
  published: 2022-05-03 00:00:00 +0000
- title: ' Recoverability Landscape of Tree Structured Markov Random Fields under Symmetric Noise '
  abstract: ' We study the problem of learning tree-structured Markov random fields (MRF) on discrete random variables with common support when the observations are corrupted by a k-ary symmetric noise channel with unknown probability of error. For Ising models (support size = 2), past work has shown that graph structure can only be recovered up to the leaf clusters (a leaf node, its parent, and its siblings form a leaf cluster) and exact recovery is impossible. No prior work has addressed the setting of support size of 3 or more, and indeed this setting is far richer. As we show, when the support size is 3 or more, the structure of the leaf clusters may be partially or fully identifiable. We provide a precise characterization of this phenomenon and show that the extent of recoverability is dictated by the joint PMF of the random variables. In particular, we provide necessary and sufficient conditions for exact recoverability. Furthermore, we present a polynomial time, sample efficient algorithm that recovers the exact tree when this is possible, or up to the unidentifiability as promised by our characterization, when full recoverability is impossible. Finally, we demonstrate the efficacy of our algorithm experimentally. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/katiyar22a.html
  PDF: https://proceedings.mlr.press/v151/katiyar22a/katiyar22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-katiyar22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ashish
    family: Katiyar
  - given: Soumya
    family: Basu
  - given: Vatsal
    family: Shah
  - given: Constantine
    family: Caramanis
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9356-9399
  id: katiyar22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9356
  lastpage: 9399
  published: 2022-05-03 00:00:00 +0000
- title: ' Learning Sparse Fixed-Structure Gaussian Bayesian Networks '
  abstract: ' Gaussian Bayesian networks are widely used to model causal interactions among continuous variables. In this work, we study the problem of learning a fixed-structure Gaussian Bayesian network up to a bounded error in total variation distance. We analyze the commonly used node-wise least squares regression LeastSquares and prove that it has the near-optimal sample complexity. We also study a couple of new algorithms for the problem: BatchAvgLeastSquares takes the average of several batches of least squares solutions at each node, so that one can interpolate between the batch size and the number of batches. We show that BatchAvgLeastSquares also has near-optimal sample complexity. CauchyEst takes the median of solutions to several batches of linear systems at each node. We show that the algorithm specialized to polytrees, CauchyEstTree, has near-optimal sample complexity. Experimentally, we show that for uncontaminated, realizable data, the LeastSquares algorithm performs best, but in the presence of contamination or DAG misspecification, CauchyEst/CauchyEstTree and BatchAvgLeastSquares respectively perform better. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/bhattacharyya22b.html
  PDF: https://proceedings.mlr.press/v151/bhattacharyya22b/bhattacharyya22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-bhattacharyya22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Arnab
    family: Bhattacharyya
  - given: Davin
    family: Choo
  - given: Rishikesh
    family: Gajjala
  - given: Sutanu
    family: Gayen
  - given: Yuhao
    family: Wang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9400-9429
  id: bhattacharyya22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9400
  lastpage: 9429
  published: 2022-05-03 00:00:00 +0000
- title: ' Learning and Generalization in Overparameterized Normalizing Flows '
  abstract: ' In supervised learning, it is known that overparameterized neural networks with one hidden layer provably and efficiently learn and generalize, when trained using stochastic gradient descent with a sufficiently small learning rate and suitable initialization. In contrast, the benefit of overparameterization in unsupervised learning is not well understood. Normalizing flows (NFs) constitute an important class of models in unsupervised learning for sampling and density estimation. In this paper, we theoretically and empirically analyze these models when the underlying neural network is a one-hidden-layer overparametrized network. Our main contributions are two-fold: (1) On the one hand, we provide theoretical and empirical evidence that for constrained NFs (this class of NFs underlies most NF constructions) with the one-hidden-layer network, overparametrization hurts training. (2) On the other hand, we prove that unconstrained NFs, a recently introduced model, can efficiently learn any reasonable data distribution under minimal assumptions when the underlying network is overparametrized and has one hidden-layer. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/shah22c.html
  PDF: https://proceedings.mlr.press/v151/shah22c/shah22c.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-shah22c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Kulin
    family: Shah
  - given: Amit
    family: Deshpande
  - given: Navin
    family: Goyal
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9430-9504
  id: shah22c
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9430
  lastpage: 9504
  published: 2022-05-03 00:00:00 +0000
- title: ' Expressivity of Neural Networks via Chaotic Itineraries beyond Sharkovsky’s Theorem '
  abstract: ' Given a target function $f$, how large must a neural network be in order to approximate $f$? Recent works examine this basic question on neural network expressivity from the lens of dynamical systems and provide novel “depth-vs-width” tradeoffs for a large family of functions $f$. They suggest that such tradeoffs are governed by the existence of periodic points or cycles in $f$. Our work, by further deploying dynamical systems concepts, illuminates a more subtle connection between periodicity and expressivity: we prove that periodic points alone lead to suboptimal depth-width tradeoffs and we improve upon them by demonstrating that certain “chaotic itineraries” give stronger exponential tradeoffs, even in regimes where previous analyses only imply polynomial gaps. Contrary to prior works, our bounds are nearly-optimal, tighten as the period increases, and handle strong notions of inapproximability (e.g., constant $L_1$ error). More broadly, we identify a phase transition to the chaotic regime that exactly coincides with an abrupt shift in other notions of function complexity, including VC-dimension and topological entropy. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/sanford22a.html
  PDF: https://proceedings.mlr.press/v151/sanford22a/sanford22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-sanford22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Clayton H.
    family: Sanford
  - given: Vaggos
    family: Chatziafratis
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9505-9549
  id: sanford22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9505
  lastpage: 9549
  published: 2022-05-03 00:00:00 +0000
- title: ' Learning Interpretable, Tree-Based Projection Mappings for Nonlinear Embeddings '
  abstract: ' Model interpretability is a topic of renewed interest given today’s widespread practical use of machine learning, and the need to trust or understand automated predictions. We consider the problem of optimally learning interpretable out-of-sample mappings for nonlinear embedding methods such as $t$-SNE. We argue for the use of sparse oblique decision trees because they strike a good tradeoff between accuracy and interpretability which can be controlled via a hyperparameter, thus allowing one to achieve a model with a desired explanatory complexity. The resulting optimization problem is difficult because decision trees are not differentiable. By using an equivalent formulation of the problem, we give an algorithm that can learn such a tree for any given nonlinear embedding objective. We illustrate experimentally how the resulting trees provide insights into the data beyond what a simple 2D visualization of the embedding does. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/zharmagambetov22a.html
  PDF: https://proceedings.mlr.press/v151/zharmagambetov22a/zharmagambetov22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-zharmagambetov22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Arman S.
    family: Zharmagambetov
  - given: Miguel A.
    family: Carreira-Perpinan
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9550-9570
  id: zharmagambetov22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9550
  lastpage: 9570
  published: 2022-05-03 00:00:00 +0000
- title: ' Disentangling Whether from When in a Neural Mixture Cure Model for Failure Time Data '
  abstract: ' The mixture cure model allows failure probability to be estimated separately from failure timing in settings wherein failure never occurs in a subset of the population. In this paper, we draw on insights from representation learning and causal inference to develop a neural network based mixture cure model that is free of distributional assumptions, yielding improved prediction of failure timing, yet still effectively disentangles information about failure timing from information about failure probability. Our approach also mitigates effects of selection biases in the observation of failure and censoring times on estimation of the failure density and censoring density, respectively. Results suggest this approach could be applied to distinguish factors predicting failure occurrence versus timing and mitigate biases in real-world observational datasets. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/engelhard22a.html
  PDF: https://proceedings.mlr.press/v151/engelhard22a/engelhard22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-engelhard22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Matthew
    family: Engelhard
  - given: Ricardo
    family: Henao
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9571-9581
  id: engelhard22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9571
  lastpage: 9581
  published: 2022-05-03 00:00:00 +0000
- title: ' Sample Complexity of Robust Reinforcement Learning with a Generative Model '
  abstract: ' The Robust Markov Decision Process (RMDP) framework focuses on designing control policies that are robust against the parameter uncertainties due to the mismatches between the simulator model and real-world settings. An RMDP problem is typically formulated as a max-min problem, where the objective is to find the policy that maximizes the value function for the worst possible model that lies in an uncertainty set around a nominal model. The standard robust dynamic programming approach requires the knowledge of the nominal model for computing the optimal robust policy. In this work, we propose a model-based reinforcement learning (RL) algorithm for learning an $\epsilon$-optimal robust policy when the nominal model is unknown. We consider three different forms of uncertainty sets, characterized by the total variation distance, chi-square divergence, and KL divergence. For each of these uncertainty sets, we give a precise characterization of the sample complexity of our proposed algorithm. In addition to the sample complexity results, we also present a formal analytical argument on the benefit of using robust policies. Finally, we demonstrate the performance of our algorithm on two benchmark problems. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/panaganti22a.html
  PDF: https://proceedings.mlr.press/v151/panaganti22a/panaganti22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-panaganti22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Kishan
    family: Panaganti
  - given: Dileep
    family: Kalathil
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9582-9602
  id: panaganti22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9582
  lastpage: 9602
  published: 2022-05-03 00:00:00 +0000
- title: ' On Coresets for Fair Regression and Individually Fair Clustering '
  abstract: ' In this paper we present coresets for Fair Regression with Statistical Parity (SP) constraints and for Individually Fair Clustering. Due to the fairness constraints, the classical coreset definition is not enough for these problems. We first define coresets for both the problems. We show that to obtain such coresets, it is sufficient to sample points based on the probabilities dependent on combination of sensitivity score and a carefully chosen term according to the fairness constraints. We give provable guarantees with relative error in preserving the cost and a small additive error in preserving fairness constraints for both problems. Since our coresets are much smaller in size as compared to $n$, the number of points, they can give huge benefits in computational costs (from polynomial to polylogarithmic in $n$), especially when $n \gg d$, where $d$ is the input dimension. We support our theoretical claims with experimental evaluations. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/chhaya22a.html
  PDF: https://proceedings.mlr.press/v151/chhaya22a/chhaya22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-chhaya22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Rachit
    family: Chhaya
  - given: Anirban
    family: Dasgupta
  - given: Jayesh
    family: Choudhari
  - given: Supratim
    family: Shit
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9603-9625
  id: chhaya22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9603
  lastpage: 9625
  published: 2022-05-03 00:00:00 +0000
- title: ' LocoProp: Enhancing BackProp via Local Loss Optimization '
  abstract: ' Second-order methods have shown state-of-the-art performance for optimizing deep neural networks. Nonetheless, their large memory requirement and high computational complexity, compared to first-order methods, hinder their versatility in a typical low-budget setup. This paper introduces a general framework of layerwise loss construction for multilayer neural networks that achieves a performance closer to second-order methods while utilizing first-order optimizers only. Our methodology lies upon a three-component loss, target, and regularizer combination, for which altering each component results in a new update rule. We provide examples using squared loss and layerwise Bregman divergences induced by the convex integral functions of various transfer functions. Our experiments on benchmark models and datasets validate the efficacy of our new approach, reducing the gap between first-order and second-order optimizers. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/amid22a.html
  PDF: https://proceedings.mlr.press/v151/amid22a/amid22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-amid22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ehsan
    family: Amid
  - given: Rohan
    family: Anil
  - given: Manfred
    family: Warmuth
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9626-9642
  id: amid22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9626
  lastpage: 9642
  published: 2022-05-03 00:00:00 +0000
- title: ' Towards Agnostic Feature-based Dynamic Pricing: Linear Policies vs Linear Valuation with Unknown Noise '
  abstract: ' In feature-based dynamic pricing, a seller sets appropriate prices for a sequence of products (described by feature vectors) on the fly by learning from the binary outcomes of previous sales sessions ("Sold" if valuation $\geq$ price, and "Not Sold" otherwise). Existing works either assume noiseless linear valuation or precisely-known noise distribution, which limits the applicability of those algorithms in practice when these assumptions are hard to verify. In this work, we study two more agnostic models: (a) a "linear policy" problem where we aim at competing with the best linear pricing policy while making no assumptions on the data, and (b) a "linear noisy valuation" problem where the random valuation is linear plus an unknown and assumption-free noise. For the former model, we show a $\Theta(d^{1/3}T^{2/3})$ minimax regret up to logarithmic factors. For the latter model, we present an algorithm that achieves an $O(T^{3/4})$ regret and improve the best-known lower bound from $Omega(T^{3/5})$ to $\Omega(T^{2/3})$. These results demonstrate that no-regret learning is possible for feature-based dynamic pricing under weak assumptions, but also reveal a disappointing fact that the seemingly richer pricing feedback is not significantly more useful than the bandit-feedback in regret reduction. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/xu22d.html
  PDF: https://proceedings.mlr.press/v151/xu22d/xu22d.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-xu22d.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jianyu
    family: Xu
  - given: Yu-Xiang
    family: Wang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9643-9662
  id: xu22d
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9643
  lastpage: 9662
  published: 2022-05-03 00:00:00 +0000
- title: ' A cautionary tale on fitting decision trees to data from additive models: generalization lower bounds '
  abstract: ' Decision trees are important both as interpretable models amenable to high-stakes decision-making, and as building blocks of ensemble methods such as random forests and gradient boosting. Their statistical properties, however, are not well understood. The most cited prior works have focused on deriving pointwise consistency guarantees for CART in a classical nonparametric regression setting. We take a different approach, and advocate studying the generalization performance of decision trees with respect to different generative regression models. This allows us to elicit their inductive bias, that is, the assumptions the algorithms make (or do not make) to generalize to new data, thereby guiding practitioners on when and how to apply these methods. In this paper, we focus on sparse additive generative models, which have both low statistical complexity and some nonparametric flexibility. We prove a sharp squared error generalization lower bound for a large class of decision tree algorithms fitted to sparse additive models with $C^1$ component functions. This bound is surprisingly much worse than the minimax rate for estimating such sparse additive models. The inefficiency is due not to greediness, but to the loss in power for detecting global structure when we average responses solely over each leaf, an observation that suggests opportunities to improve tree-based algorithms, for example, by hierarchical shrinkage. To prove these bounds, we develop new technical machinery, establishing a novel connection between decision tree estimation and rate-distortion theory, a sub-field of information theory. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/shuo-tan22a.html
  PDF: https://proceedings.mlr.press/v151/shuo-tan22a/shuo-tan22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-shuo-tan22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yan
    family: Shuo Tan
  - given: Abhineet
    family: Agarwal
  - given: Bin
    family: Yu
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9663-9685
  id: shuo-tan22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9663
  lastpage: 9685
  published: 2022-05-03 00:00:00 +0000
- title: ' Exact Community Recovery over Signed Graphs '
  abstract: ' Signed graphs encode similarity and dissimilarity relationships among different entities with positive and negative edges. In this paper, we study the problem of community recovery over signed graphs generated by the signed stochastic block model (SSBM) with two equal-sized communities. Our approach is based on the maximum likelihood estimation (MLE) of the SSBM. Unlike many existing approaches, our formulation reveals that the positive and negative edges of a signed graph should be treated unequally. We then propose a simple two-stage iterative algorithm for solving the regularized MLE. It is shown that in the logarithmic degree regime, the proposed algorithm can exactly recover the underlying communities in nearly-linear time at the information-theoretic limit. Numerical results on both synthetic and real data are reported to validate and complement our theoretical developments and demonstrate the efficacy of the proposed method. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/wang22i.html
  PDF: https://proceedings.mlr.press/v151/wang22i/wang22i.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-wang22i.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Xiaolu
    family: Wang
  - given: Peng
    family: Wang
  - given: Anthony
    family: Man-Cho So
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9686-9710
  id: wang22i
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9686
  lastpage: 9710
  published: 2022-05-03 00:00:00 +0000
- title: ' Laplacian Constrained Precision Matrix Estimation: Existence and High Dimensional Consistency '
  abstract: ' This paper considers the problem of estimating high dimensional Laplacian constrained precision matrices by minimizing Stein’s loss. We obtain a necessary and sufficient condition for existence of this estimator, that consists on checking whether a certain data dependent graph is connected. We also prove consistency in the high dimensional setting under the symmetrized Stein loss. We show that the error rate does not depend on the graph sparsity, or other type of structure, and that Laplacian constraints are sufficient for high dimensional consistency. Our proofs exploit properties of graph Laplacians, the matrix tree theorem, and a characterization of the proposed estimator based on effective graph resistances. We validate our theoretical claims with numerical experiments. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/pavez22a.html
  PDF: https://proceedings.mlr.press/v151/pavez22a/pavez22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-pavez22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Eduardo
    family: Pavez
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9711-9722
  id: pavez22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9711
  lastpage: 9722
  published: 2022-05-03 00:00:00 +0000
- title: ' Primal-Dual Stochastic Mirror Descent for MDPs '
  abstract: ' We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs). For this purpose, some variant of Stochastic Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals. An important detail is the ability to use inexact values of functional constraints and compute the value of dual variables. We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method. Using this algorithm, we get the first parallel algorithm for mixing average-reward MDPs with a generative model without reduction to discounted MDP. One of the main features of the presented method is low communication costs in a distributed centralized setting, even with very large networks. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/tiapkin22a.html
  PDF: https://proceedings.mlr.press/v151/tiapkin22a/tiapkin22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-tiapkin22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Daniil
    family: Tiapkin
  - given: Alexander
    family: Gasnikov
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9723-9740
  id: tiapkin22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9723
  lastpage: 9740
  published: 2022-05-03 00:00:00 +0000
- title: ' Convex Analysis of the Mean Field Langevin Dynamics '
  abstract: ' As an example of the nonlinear Fokker-Planck equation, the mean field Langevin dynamics recently attracts attention due to its connection to (noisy) gradient descent on infinitely wide neural networks in the mean field regime, and hence the convergence property of the dynamics is of great theoretical interest. In this work, we give a concise and self-contained convergence rate analysis of the mean field Langevin dynamics with respect to the (regularized) objective function in both continuous and discrete time settings. The key ingredient of our proof is a proximal Gibbs distribution $p_q$ associated with the dynamics, which, in combination with techniques in Vempala and Wibisono (2019), allows us to develop a simple convergence theory parallel to classical results in convex optimization. Furthermore, we reveal that $p_q$ connects to the duality gap in the empirical risk minimization setting, which enables efficient empirical evaluation of the algorithm convergence. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/nitanda22a.html
  PDF: https://proceedings.mlr.press/v151/nitanda22a/nitanda22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-nitanda22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Atsushi
    family: Nitanda
  - given: Denny
    family: Wu
  - given: Taiji
    family: Suzuki
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9741-9757
  id: nitanda22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9741
  lastpage: 9757
  published: 2022-05-03 00:00:00 +0000
- title: ' Information-Theoretic Analysis of Epistemic Uncertainty in Bayesian Meta-learning '
  abstract: ' The overall predictive uncertainty of a trained predictor can be decomposed into separate contributions due to epistemic and aleatoric uncertainty. Under a Bayesian formulation, assuming a well-specified model, the two contributions can be exactly expressed (for the log-loss) or bounded (for more general losses) in terms of information-theoretic quantities (Xu and Raginsky [2020]). This paper addresses the study of epistemic uncertainty within an information-theoretic framework in the broader setting of Bayesian meta-learning. A general hierarchical Bayesian model is assumed in which hyperparameters determine the per-task priors of the model parameters. Exact characterizations (for the log-loss) and bounds (for more general losses) are derived for the epistemic uncertainty – quantified by the minimum excess meta-risk (MEMR)– of optimal meta-learning rules. This characterization is leveraged to bring insights into the dependence of the epistemic uncertainty on the number of tasks and on the amount of per-task training data. Experiments are presented that use the proposed information-theoretic bounds, evaluated via neural mutual information estimators, to compare the performance of conventional learning and meta-learning as the number of meta-learning tasks increases. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/theresa-jose22a.html
  PDF: https://proceedings.mlr.press/v151/theresa-jose22a/theresa-jose22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-theresa-jose22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Sharu
    family: Theresa Jose
  - given: Sangwoo
    family: Park
  - given: Osvaldo
    family: Simeone
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9758-9775
  id: theresa-jose22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9758
  lastpage: 9775
  published: 2022-05-03 00:00:00 +0000
- title: ' Marginalising over Stationary Kernels with Bayesian Quadrature '
  abstract: ' Marginalising over families of Gaussian Process kernels produces flexible model classes with well-calibrated uncertainty estimates. Existing approaches require likelihood evaluations of many kernels, rendering them prohibitively expensive for larger datasets. We propose a Bayesian Quadrature scheme to make this marginalisation more efficient and thereby more practical. Through use of maximum mean discrepancies between distributions, we define a kernel over kernels that captures invariances between Spectral Mixture (SM) Kernels. Kernel samples are selected by generalising an information-theoretic acquisition function for warped Bayesian Quadrature. We show that our framework achieves more accurate predictions with better calibrated uncertainty than state-of-the-art baselines, especially when given limited (wall-clock) time budgets. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/hamid22a.html
  PDF: https://proceedings.mlr.press/v151/hamid22a/hamid22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-hamid22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Saad
    family: Hamid
  - given: Sebastian
    family: Schulze
  - given: Michael A.
    family: Osborne
  - given: Stephen
    family: Roberts
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9776-9792
  id: hamid22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9776
  lastpage: 9792
  published: 2022-05-03 00:00:00 +0000
- title: ' On the Convergence of Stochastic Extragradient for Bilinear Games using Restarted Iteration Averaging '
  abstract: ' We study the stochastic bilinear minimax optimization problem, presenting an analysis of the same-sample Stochastic ExtraGradient (SEG) method with constant step size, and presenting variations of the method that yield favorable convergence. In sharp contrasts with the basic SEG method whose last iterate only contracts to a fixed neighborhood of the Nash equilibrium, SEG augmented with iteration averaging provably converges to the Nash equilibrium under the same standard settings, and such a rate is further improved by incorporating a scheduled restarting procedure. In the interpolation setting where noise vanishes at the Nash equilibrium, we achieve an optimal convergence rate up to tight constants. We present numerical experiments that validate our theoretical findings and demonstrate the effectiveness of the SEG method when equipped with iteration averaging and restarting. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/junchi-li22a.html
  PDF: https://proceedings.mlr.press/v151/junchi-li22a/junchi-li22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-junchi-li22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Chris
    family: Junchi Li
  - given: Yaodong
    family: Yu
  - given: Nicolas
    family: Loizou
  - given: Gauthier
    family: Gidel
  - given: Yi
    family: Ma
  - given: Nicolas
    family: Le Roux
  - given: Michael
    family: Jordan
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9793-9826
  id: junchi-li22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9793
  lastpage: 9826
  published: 2022-05-03 00:00:00 +0000
- title: ' An Unsupervised Hunt for Gravitational Lenses '
  abstract: ' Strong gravitational lenses allow us to peer into the farthest reaches of space by bending the light from a background object around a massive object in the foreground. Unfortunately, these lenses are extremely rare, and manually finding them in astronomy surveys is difficult and time-consuming. We are thus tasked with finding them in an automated fashion with few, if any, known lenses to form positive samples. To assist us with training, we can simulate realistic lenses within our survey images to form positive samples. Naively training a ResNet model with these simulated lenses results in a poor precision for the desired high recall, because the simulations contain artifacts that are learned by the model. In this work, we develop a lens detection method that combines simulation, data augmentation, semi-supervised learning, and GANs to improve this performance by an order of magnitude. We perform ablation studies and examine how performance scales with the number of non-lenses and simulated lenses. These findings allow researchers to go into a survey mostly "blind" and still be able to classify strong gravitational lenses with high precision and recall. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/sheng22a.html
  PDF: https://proceedings.mlr.press/v151/sheng22a/sheng22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-sheng22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Stephen
    family: Sheng
  - given: Keerthi
    family: Vasan G C
  - given: Chi
    family: Po P Choi
  - given: James
    family: Sharpnack
  - given: Tucker
    family: Jones
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9827-9843
  id: sheng22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9827
  lastpage: 9843
  published: 2022-05-03 00:00:00 +0000
- title: ' Sobolev Transport: A Scalable Metric for Probability Measures with Graph Metrics '
  abstract: ' Optimal transport (OT) is a popular measure to compare probability distributions. However, OT suffers a few drawbacks such as (i) a high complexity for computation, (ii) indefiniteness which limits its applicability to kernel machines. In this work, we consider probability measures supported on a graph metric space and propose a novel Sobolev transport metric. We show that the Sobolev transport metric yields a <em>closed-form</em> formula for fast computation and it is negative definite. We show that the space of probability measures endowed with this transport distance is isometric to a bounded convex set in a Euclidean space with a weighted l_p distance. We further exploit the negative definiteness of the Sobolev transport to design positive-definite kernels, and evaluate their performances against other baselines in document classification with word embeddings and in topological data analysis. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/le22b.html
  PDF: https://proceedings.mlr.press/v151/le22b/le22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-le22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Tam
    family: Le
  - given: Truyen
    family: Nguyen
  - given: Dinh
    family: Phung
  - given: Viet
    family: Anh Nguyen
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9844-9868
  id: le22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9844
  lastpage: 9868
  published: 2022-05-03 00:00:00 +0000
- title: ' Mean Nyström Embeddings for Adaptive Compressive Learning '
  abstract: ' Compressive learning is an approach to efficient large scale learning based on sketching an entire dataset to a single mean embedding (the sketch), i.e. a vector of generalized moments. The learning task is then approximately solved as an inverse problem using an adapted parametric model. Previous works in this context have focused on sketches obtained by averaging random features, that while universal can be poorly adapted to the problem at hand. In this paper, we propose and study the idea of performing sketching based on data-dependent Nyström approximation. From a theoretical perspective we prove that the excess risk can be controlled under a geometric assumption relating the parametric model used to learn from the sketch and the covariance operator associated to the task at hand. Empirically, we show for k-means clustering and Gaussian modeling that for a fixed sketch size, Nyström sketches indeed outperform those built with random features. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/chatalic22a.html
  PDF: https://proceedings.mlr.press/v151/chatalic22a/chatalic22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-chatalic22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Antoine
    family: Chatalic
  - given: Luigi
    family: Carratino
  - given: Ernesto
    family: De Vito
  - given: Lorenzo
    family: Rosasco
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9869-9889
  id: chatalic22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9869
  lastpage: 9889
  published: 2022-05-03 00:00:00 +0000
- title: ' GraphAdaMix: Enhancing Node Representations with Graph Adaptive Mixtures '
  abstract: ' Graph Neural Networks (GNNs) are the current state-of-the-art models in learning node representations for many predictive tasks on graphs. Typically, GNNs reuses the same set of model parameters across all nodes in the graph to improve the training efficiency and exploit the translationally-invariant properties in many datasets. However, the parameter sharing scheme prevents GNNs from distinguishing two nodes having the same local structure and that the translation invariance property may not exhibit in real-world graphs. In this paper, we present Graph Adaptive Mixtures (GraphAdaMix), a novel approach for learning node representations in a graph by introducing multiple independent GNN models and a trainable mixture distribution for each node. GraphAdaMix can adapt to tasks with different settings. Specifically, for semi-supervised tasks, we optimize GraphAdaMix using the Expectation-Maximization (EM) algorithm, while in unsupervised settings, GraphAdaMix is trained following the paradigm of contrastive learning. We evaluate GraphAdaMix on ten benchmark datasets with extensive experiments. GraphAdaMix is demonstrated to consistently boost state-of-the-art GNN variants in semi-supervised and unsupervised node classification tasks. The code of GraphAdaMix is available online. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/sun-handason-tam22a.html
  PDF: https://proceedings.mlr.press/v151/sun-handason-tam22a/sun-handason-tam22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-sun-handason-tam22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Da
    family: Sun Handason Tam
  - given: Siyue
    family: Xie
  - given: Wing
    family: Cheong Lau
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9890-9907
  id: sun-handason-tam22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9890
  lastpage: 9907
  published: 2022-05-03 00:00:00 +0000
- title: ' Provable Adversarial Robustness for Fractional Lp Threat Models '
  abstract: ' In recent years, researchers have extensively studied adversarial robustness in a variety of threat models, including L_0, L_1, L_2, and L_infinity-norm bounded adversarial attacks. However, attacks bounded by fractional L_p "norms" (quasi-norms defined by the L_p distance with 0<p<1) have yet to be thoroughly considered. We proactively propose a defense with several desirable properties: it provides provable (certified) robustness, scales to ImageNet, and yields deterministic (rather than high-probability) certified guarantees when applied to quantized data (e.g., images). Our technique for fractional L_p robustness constructs expressive, deep classifiers that are globally Lipschitz with respect to the L_p^p metric, for any 0<p<1. However, our method is even more general: we can construct classifiers which are globally Lipschitz with respect to any metric defined as the sum of concave functions of components. Our approach builds on a recent work, Levine and Feizi (2021), which provides a provable defense against L_1 attacks. However, we demonstrate that our proposed guarantees are highly non-vacuous, compared to the trivial solution of using (Levine and Feizi, 2021) directly and applying norm inequalities. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/levine22a.html
  PDF: https://proceedings.mlr.press/v151/levine22a/levine22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-levine22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Alexander J.
    family: Levine
  - given: Soheil
    family: Feizi
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9908-9942
  id: levine22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9908
  lastpage: 9942
  published: 2022-05-03 00:00:00 +0000
- title: ' On Some Fast And Robust Classifiers For High Dimension, Low Sample Size Data '
  abstract: ' In high dimension, low sample size (HDLSS) settings, distance concentration phenomena affects the performance of several popular classifiers which are based on Euclidean distances. The behaviour of these classifiers in high dimensions is completely governed by the first and second order moments of the underlying class distributions. Moreover, the classifiers become useless for such HDLSS data when the first two moments of the competing distributions are equal, or when the moments do not exist. In this work, we propose robust, computationally efficient and tuning-free classifiers applicable in the HDLSS scenario. As the data dimension increases, these classifiers yield perfect classification if the one-dimensional marginals of the underlying distributions are different. We establish strong theoretical properties for the proposed classifiers in ultrahigh-dimensional settings. Numerical experiments with a wide variety of simulated examples and analysis of real data sets exhibit clear and convincing advantages over existing methods. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/roy22a.html
  PDF: https://proceedings.mlr.press/v151/roy22a/roy22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-roy22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Sarbojit
    family: Roy
  - given: Jyotishka
    family: Ray Choudhury
  - given: Subhajit
    family: Dutta
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9943-9968
  id: roy22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9943
  lastpage: 9968
  published: 2022-05-03 00:00:00 +0000
- title: ' Learning Pareto-Efficient Decisions with Confidence '
  abstract: ' The paper considers the problem of multi-objective decision support when outcomes are uncertain. We extend the concept of Pareto-efficient decisions to take into account the uncertainty of decision outcomes across varying contexts. This enables quantifying trade-offs between decisions in terms of tail outcomes that are relevant in safety-critical applications. We propose a method for learning efficient decisions with statistical confidence, building on results from the conformal prediction literature. The method adapts to weak or nonexistent context covariate overlap and its statistical guarantees are evaluated using both synthetic and real data. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ek22a.html
  PDF: https://proceedings.mlr.press/v151/ek22a/ek22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ek22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Sofia
    family: Ek
  - given: Dave
    family: Zachariah
  - given: Peter
    family: Stoica
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9969-9981
  id: ek22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9969
  lastpage: 9981
  published: 2022-05-03 00:00:00 +0000
- title: ' Efficient and passive learning of networked dynamical systems driven by non-white exogenous inputs '
  abstract: ' We consider a networked linear dynamical system with p agents/nodes. We study the problem of learning the underlying graph of interactions/dependencies from observations of the nodal trajectories over a time-interval T. We present a regularized non-casual consistent estimator for this problem and analyze its sample complexity over two regimes: (a) where the interval T consists of n i.i.d. observation windows of length T/n (restart and record), and (b) where T is one continuous observation window (consecutive). Using the theory of M-estimators, we show that the estimator recovers the underlying interactions, in either regime, in a time-interval that is logarithmic in the system size p. To the best of our knowledge, this is the first work to analyze the sample complexity of learning linear dynamical systems driven by unobserved not-white wide-sense stationary (WSS) inputs. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/doddi22a.html
  PDF: https://proceedings.mlr.press/v151/doddi22a/doddi22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-doddi22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Harish
    family: Doddi
  - given: Deepjyoti
    family: Deka
  - given: Saurav
    family: Talukdar
  - given: Murti
    family: Salapaka
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9982-9997
  id: doddi22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9982
  lastpage: 9997
  published: 2022-05-03 00:00:00 +0000
- title: ' Compressed Rule Ensemble Learning '
  abstract: ' Ensembles of decision rules extracted from tree ensembles, like RuleFit, promise a good trade-off between predictive performance and model simplicity. However, they are affected by competing interests: While a sufficiently large number of binary, non-smooth rules is necessary to fit smooth, well generalizing decision boundaries, a too high number of rules in the ensemble severely jeopardizes interpretability. As a way out of this dilemma, we propose to take an extra step in the rule extraction step and compress clusters of similar rules into ensemble rules. The outputs of the individual rules in each cluster are pooled to produce a single soft output, reflecting the original ensemble’s marginal smoothing behaviour. The final model, that we call Compressed Rule Ensemble (CRE), fits a linear combination of ensemble rules. We empirically show that CRE is both sparse and accurate on various datasets, carrying over the ensemble behaviour while remaining interpretable. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/nalenz22a.html
  PDF: https://proceedings.mlr.press/v151/nalenz22a/nalenz22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-nalenz22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Malte
    family: Nalenz
  - given: Thomas
    family: Augustin
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 9998-10014
  id: nalenz22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 9998
  lastpage: 10014
  published: 2022-05-03 00:00:00 +0000
- title: ' Pick-and-Mix Information Operators for Probabilistic ODE Solvers '
  abstract: ' Probabilistic numerical solvers for ordinary differential equations compute posterior distributions over the solution of an initial value problem via Bayesian inference. In this paper, we leverage their probabilistic formulation to seamlessly include additional information as general likelihood terms. We show that second-order differential equations should be directly provided to the solver, instead of transforming the problem to first order. Additionally, by including higher-order information or physical conservation laws in the model, solutions become more accurate and more physically meaningful. Lastly, we demonstrate the utility of flexible information operators by solving differential-algebraic equations. In conclusion, the probabilistic formulation of numerical solvers offers a flexible way to incorporate various types of information, thus improving the resulting solutions. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/bosch22a.html
  PDF: https://proceedings.mlr.press/v151/bosch22a/bosch22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-bosch22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Nathanael
    family: Bosch
  - given: Filip
    family: Tronarp
  - given: Philipp
    family: Hennig
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10015-10027
  id: bosch22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10015
  lastpage: 10027
  published: 2022-05-03 00:00:00 +0000
- title: ' Finite Sample Analysis of Mean-Volatility Actor-Critic for Risk-Averse Reinforcement Learning '
  abstract: ' The goal in the standard reinforcement learning problem is to find a policy that optimizes the expected return. However, such an objective is not adequate in a lot of real-life applications, like finance, where controlling the uncertainty of the outcome is imperative. The mean-volatility objective penalizes, through a tunable parameter, policies with high variance of the per-step reward. An interesting property of this objective is that it admits simple linear Bellman equations that resemble, up to a reward transformation, those of the risk-neutral case. However, the required reward transformation is policy-dependent, and requires the (usually unknown) expected return of the used policy. In this work, we propose two general methods for policy evaluation under the mean-volatility objective: the direct method and the factored method. We then extend recent results for finite sample analysis in the risk-neutral actor-critic setting to the mean-volatility case. Our analysis shows that the sample complexity to attain an $\epsilon$-accurate stationary point is the same as that of the risk-neutral version, using either policy evaluation method for training the critic. Finally, we carry out experiments to test the proposed methods in a simple environment that exhibits some trade-off between optimality, in expectation, and uncertainty of outcome. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/eldowa22a.html
  PDF: https://proceedings.mlr.press/v151/eldowa22a/eldowa22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-eldowa22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Khaled
    family: Eldowa
  - given: Lorenzo
    family: Bisi
  - given: Marcello
    family: Restelli
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10028-10066
  id: eldowa22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10028
  lastpage: 10066
  published: 2022-05-03 00:00:00 +0000
- title: ' The Importance of Future Information in Credit Card Fraud Detection '
  abstract: ' Fraud detection systems (FDS) mainly perform two tasks: (i) real-time detection while the payment is being processed and (ii) posterior detection to block the card retrospectively and avoid further frauds. Since human verification is often necessary and the payment processing time is limited, the second task manages the largest volume of transactions. In the literature, fraud detection challenges and algorithms performance are widely studied but the very formulation of the problem is never disrupted: it aims at predicting if a transaction is fraudulent based on its characteristics and the past transactions of the cardholder. Yet, in posterior detection, verification often takes days, so new payments on the card become available before a decision is taken. This is our motivation to propose a new paradigm: posterior fraud detection with "future" information. We start by providing evidence of the on-time availability of subsequent transactions, usable as extra context to improve detection. We then design a Bidirectional LSTM to make use of these transactions. On a real-world dataset with over 30 million transactions, it achieves higher performance than a regular LSTM, which is the state-of-the-art classifier for fraud detection that only uses the past context. We also introduce new metrics to show that the proposal catches more frauds, more compromised cards, and based on their earliest frauds. We believe that future works on this new paradigm will have a significant impact on the detection of compromised cards. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/bach-nguyen22a.html
  PDF: https://proceedings.mlr.press/v151/bach-nguyen22a/bach-nguyen22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-bach-nguyen22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Van
    family: Bach Nguyen
  - given: Kanishka
    family: Ghosh Dastidar
  - given: Michael
    family: Granitzer
  - given: Wissam
    family: Siblini
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10067-10077
  id: bach-nguyen22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10067
  lastpage: 10077
  published: 2022-05-03 00:00:00 +0000
- title: ' A Non-asymptotic Approach to Best-Arm Identification for Gaussian Bandits '
  abstract: ' We propose a new strategy for best-arm identification with fixed confidence of Gaussian variables with bounded means and unit variance. This strategy, called Exploration-Biased Sampling, is not only asymptotically optimal: it is to the best of our knowledge the first strategy with non-asymptotic bounds that asymptotically matches the sample complexity. But the main advantage over other algorithms like Track-and-Stop is an improved behavior regarding exploration: Exploration-Biased Sampling is biased towards exploration in a subtle but natural way that makes it more stable and interpretable. These improvements are allowed by a new analysis of the sample complexity optimization problem, which yields a faster numerical resolution scheme and several quantitative regularity results that we believe of high independent interest. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/barrier22a.html
  PDF: https://proceedings.mlr.press/v151/barrier22a/barrier22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-barrier22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Antoine
    family: Barrier
  - given: Aurélien
    family: Garivier
  - given: Tomáš
    family: Kocák
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10078-10109
  id: barrier22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10078
  lastpage: 10109
  published: 2022-05-03 00:00:00 +0000
- title: ' Differentially Private Federated Learning on Heterogeneous Data '
  abstract: ' Federated Learning (FL) is a paradigm for large-scale distributed learning which faces two key challenges: (i) training efficiently from highly heterogeneous user data, and (ii) protecting the privacy of participating users. In this work, we propose a novel FL approach (DP-SCAFFOLD) to tackle these two challenges together by incorporating Differential Privacy (DP) constraints into the popular SCAFFOLD algorithm. We focus on the challenging setting where users communicate with a “honest-but-curious” server without any trusted intermediary, which requires to ensure privacy not only towards a third party observing the final model but also towards the server itself. Using advanced results from DP theory, we establish the convergence of our algorithm for convex and non-convex objectives. Our paper clearly highlights the trade-off between utility and privacy and demonstrates the superiority of DP-SCAFFOLD over the state-of-the-art algorithm DP-FedAvg when the number of local updates and the level of heterogeneity grows. Our numerical results confirm our analysis and show that DP-SCAFFOLD provides significant gains in practice. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/noble22a.html
  PDF: https://proceedings.mlr.press/v151/noble22a/noble22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-noble22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Maxence
    family: Noble
  - given: Aurélien
    family: Bellet
  - given: Aymeric
    family: Dieuleveut
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10110-10145
  id: noble22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10110
  lastpage: 10145
  published: 2022-05-03 00:00:00 +0000
- title: ' Efficient computation of the the volume of a polytope in high-dimensions using Piecewise Deterministic Markov Processes '
  abstract: ' Computing the volume of a polytope in high dimensions is computationally challenging but has wide applications. Current state-of-the-art algorithms to compute such volumes rely on efficient sampling of a Gaussian distribution restricted to the polytope, using e.g. Hamiltonian Monte Carlo. We present a new sampling strategy that uses a Piecewise Deterministic Markov Process. Like Hamiltonian Monte Carlo, this new method involves simulating trajectories of a non-reversible process and inherits similar good mixing properties. However, importantly, the process can be simulated more easily due to its piecewise linear trajectories – and this leads to a reduction of the computational cost by a factor of the dimension of the space. Our experiments indicate that our method is numerically robust and is one order of magnitude faster (or better) than existing methods using Hamiltonian Monte Carlo. On a single core processor, we report computational time of a few minutes up to dimension 500. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/chevallier22a.html
  PDF: https://proceedings.mlr.press/v151/chevallier22a/chevallier22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-chevallier22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Augustin
    family: Chevallier
  - given: Frédéric
    family: Cazals
  - given: Paul
    family: Fearnhead
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10146-10160
  id: chevallier22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10146
  lastpage: 10160
  published: 2022-05-03 00:00:00 +0000
- title: ' Triangular Flows for Generative Modeling: Statistical Consistency, Smoothness Classes, and Fast Rates '
  abstract: ' Triangular flows, also known as Knöthe-Rosenblatt measure couplings, comprise an important building block of normalizing flow models for generative modeling and density estimation, including popular autoregressive flows such as real-valued non-volume preserving transformation models (Real NVP). We present statistical guarantees and sample complexity bounds for triangular flow statistical models. In particular, we establish the statistical consistency and the finite sample convergence rates of the minimum Kullback-Leibler divergence statistical estimator of the Knöthe-Rosenblatt measure coupling using tools from empirical process theory. Our results highlight the anisotropic geometry of function classes at play in triangular flows, shed light on optimal coordinate ordering, and lead to statistical guarantees for Jacobian flows. We conduct numerical experiments to illustrate the practical implications of our theoretical findings. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/irons22a.html
  PDF: https://proceedings.mlr.press/v151/irons22a/irons22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-irons22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Nicholas J.
    family: Irons
  - given: Meyer
    family: Scetbon
  - given: Soumik
    family: Pal
  - given: Zaid
    family: Harchaoui
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10161-10195
  id: irons22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10161
  lastpage: 10195
  published: 2022-05-03 00:00:00 +0000
- title: ' Solving Marginal MAP Exactly by Probabilistic Circuit Transformations '
  abstract: ' Probabilistic circuits (PCs) are a class of tractable probabilistic models that allow efficient, often linear-time, inference of queries such as marginals and most probable explanations (MPE). However, marginal MAP, which is central to many decision-making problems, remains a hard query for PCs unless they satisfy highly restrictive structural constraints. In this paper, we develop a pruning algorithm that removes parts of the PC that are irrelevant to a marginal MAP query, shrinking the PC while maintaining the correct solution. This pruning technique is so effective that we are able to build a marginal MAP solver based solely on iteratively transforming the circuit—no search is required. We empirically demonstrate the efficacy of our approach on real-world datasets. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/choi22b.html
  PDF: https://proceedings.mlr.press/v151/choi22b/choi22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-choi22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yoojung
    family: Choi
  - given: Tal
    family: Friedman
  - given: Guy
    family: Van Den Broeck
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10196-10208
  id: choi22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10196
  lastpage: 10208
  published: 2022-05-03 00:00:00 +0000
- title: ' REPID: Regional Effect Plots with implicit Interaction Detection '
  abstract: ' Machine learning models can automatically learn complex relationships, such as non-linear and interaction effects. Interpretable machine learning methods such as partial dependence plots visualize marginal feature effects but may lead to misleading interpretations when feature interactions are present. Hence, employing additional methods that can detect and measure the strength of interactions is paramount to better understand the inner workings of machine learning models. We demonstrate several drawbacks of existing global interaction detection approaches, characterize them theoretically, and evaluate them empirically. Furthermore, we introduce regional effect plots with implicit interaction detection, a novel framework to detect interactions between a feature of interest and other features. The framework also quantifies the strength of interactions and provides interpretable and distinct regions in which feature effects can be interpreted more reliably, as they are less confounded by interactions. We prove the theoretical eligibility of our method and show its applicability on various simulation and real-world examples. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/herbinger22a.html
  PDF: https://proceedings.mlr.press/v151/herbinger22a/herbinger22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-herbinger22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Julia
    family: Herbinger
  - given: Bernd
    family: Bischl
  - given: Giuseppe
    family: Casalicchio
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10209-10233
  id: herbinger22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10209
  lastpage: 10233
  published: 2022-05-03 00:00:00 +0000
- title: ' Minimal Expected Regret in Linear Quadratic Control '
  abstract: ' We consider the problem of online learning in Linear Quadratic Control systems whose state transition and state-action transition matrices $A$ and $B$ may be initially unknown. We devise an online learning algorithm and provide guarantees on its expected regret. This regret at time $T$ is upper bounded (i) by $\widetilde{O}((d_u+d_x)\sqrt{d_xT})$ when $A$ and $B$ are unknown, (ii) by $\widetilde{O}(d_x^2\log(T))$ if only $A$ is unknown, and (iii) by $\widetilde{O}(d_x(d_u+d_x)\log(T))$ if only $B$ is unknown and under some mild non-degeneracy condition ($d_x$ and $d_u$ denote the dimensions of the state and of the control input, respectively). These regret scalings are minimal in $T$, $d_x$ and $d_u$ as they match existing lower bounds in scenario (i) when $d_x\le d_u$ [SF20], and in scenario (ii) [Lai86]. We conjecture that our upper bounds are also optimal in scenario (iii) (there is no known lower bound in this setting). Existing online algorithms proceed in epochs of (typically exponentially) growing durations. The control policy is fixed within each epoch, which considerably simplifies the analysis of the estimation error on $A$ and $B$ and hence of the regret. Our algorithm departs from this design choice: it is a simple variant of certainty-equivalence regulators, where the estimates of $A$ and $B$ and the resulting control policy can be updated as frequently as we wish, possibly at every step. Quantifying the impact of such a constantly-varying control policy on the performance of these estimates and on the regret constitutes one of the technical challenges tackled in this paper. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/jedra22a.html
  PDF: https://proceedings.mlr.press/v151/jedra22a/jedra22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-jedra22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yassir
    family: Jedra
  - given: Alexandre
    family: Proutiere
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10234-10321
  id: jedra22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10234
  lastpage: 10321
  published: 2022-05-03 00:00:00 +0000
- title: ' Data-splitting improves statistical performance in overparameterized regimes '
  abstract: ' While large training datasets generally offer improvement in model performance, the training process becomes computationally expensive and time consuming. Distributed learning is a common strategy to reduce the overall training time by exploiting multiple computing devices. Recently, it has been observed in the single machine setting that overparameterization is essential for benign overfitting in ridgeless regression in Hilbert spaces. We show that in this regime, data splitting has a regularizing effect, hence improving statistical performance and computational complexity at the same time. We further provide a unified framework that allows to analyze both the finite and infinite dimensional setting. We numerically demonstrate the effect of different model parameters. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/muecke22a.html
  PDF: https://proceedings.mlr.press/v151/muecke22a/muecke22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-muecke22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Nicole
    family: Muecke
  - given: Enrico
    family: Reiss
  - given: Jonas
    family: Rungenhagen
  - given: Markus
    family: Klein
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10322-10350
  id: muecke22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10322
  lastpage: 10350
  published: 2022-05-03 00:00:00 +0000
- title: ' Towards Understanding Biased Client Selection in Federated Learning '
  abstract: ' Federated learning is a distributed optimization paradigm that enables a large number of resource-limited client nodes to cooperatively train a model without data sharing. Previous works analyzed the convergence of federated learning by accounting of data heterogeneity, communication/computation limitations, and partial client participation. However, most assume unbiased client participation, where clients are selected such that the aggregated model update is unbiased. In our work, we present the convergence analysis of federated learning with biased client selection and quantify how the bias affects convergence speed. We show that biasing client selection towards clients with higher local loss yields faster error convergence. From this insight, we propose Power-of-Choice, a communication- and computation-efficient client selection framework that flexibly spans the trade-off between convergence speed and solution bias. Extensive experiments demonstrate that Power-of-Choice can converge up to 3 times faster and give $10%$ higher test accuracy than the baseline random selection. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/jee-cho22a.html
  PDF: https://proceedings.mlr.press/v151/jee-cho22a/jee-cho22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-jee-cho22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yae
    family: Jee Cho
  - given: Jianyu
    family: Wang
  - given: Gauri
    family: Joshi
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10351-10375
  id: jee-cho22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10351
  lastpage: 10375
  published: 2022-05-03 00:00:00 +0000
- title: ' Statistical Depth Functions for Ranking Distributions: Definitions, Statistical Learning and Applications '
  abstract: ' The concept of median/consensus has been widely investigated in order to provide a statistical summary of ranking data, i.e. realizations of a random permutation $\Sigma$ of a finite set, $\{1,; \ldots,;{n}\}$ with $n\geq 1$ say. As it sheds light onto only one aspect of $\Sigma$’s distribution $P$, it may neglect other informative features. It is the purpose of this paper to define analogues of quantiles, ranks and statistical procedures based on such quantities for the analysis of ranking data by means of a metric-based notion of depth function on the symmetric group. Overcoming the absence of vector space structure on $\mathfrak{S}_n$, the latter defines a center-outward ordering of the permutations in the support of $P$ and extends the classic metric-based formulation of consensus ranking (medians corresponding then to the deepest permutations). The axiomatic properties that ranking depths should ideally possess are listed, while computational and generalization issues are studied at length. Beyond the theoretical analysis carried out, the relevance of the novel concepts and methods introduced for a wide variety of statistical tasks are also supported by numerous numerical experiments. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/goibert22a.html
  PDF: https://proceedings.mlr.press/v151/goibert22a/goibert22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-goibert22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Morgane
    family: Goibert
  - given: Stephan
    family: Clemencon
  - given: Ekhine
    family: Irurozki
  - given: Pavlo
    family: Mozharovskyi
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10376-10406
  id: goibert22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10376
  lastpage: 10406
  published: 2022-05-03 00:00:00 +0000
- title: ' Effective Nonlinear Feature Selection Method based on HSIC Lasso and with Variational Inference '
  abstract: ' HSIC Lasso is one of the most effective sparse nonlinear feature selection methods based on the Hilbert-Schmidt independence criterion. We propose an adaptive nonlinear feature selection method, which is based on the HSIC Lasso, that uses a stochastic model with a family of super-Gaussian prior distributions for sparsity enhancement. The method includes easily implementable closed-form update equations that are derived approximately from variational inference and can handle high-dimensional and large datasets. We applied the method to several synthetic datasets and real-world datasets and verified its effectiveness regarding redundancy, computational complexity, and classification and prediction accuracy using the selected features. The results indicate that the method can more effectively remove irrelevant features, leaving only relevant features. In certain problem settings, the method assigned non-zero importance only to the actually relevant features. This is an important characteristic for practical use. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/koyama22a.html
  PDF: https://proceedings.mlr.press/v151/koyama22a/koyama22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-koyama22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Kazuki
    family: Koyama
  - given: Keisuke
    family: Kiritoshi
  - given: Tomomi
    family: Okawachi
  - given: Tomonori
    family: Izumitani
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10407-10421
  id: koyama22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10407
  lastpage: 10421
  published: 2022-05-03 00:00:00 +0000
- title: ' Sobolev Norm Learning Rates for Conditional Mean Embeddings '
  abstract: ' We develop novel learning rates for conditional mean embeddings by applying the theory of interpolation for reproducing kernel Hilbert spaces (RKHS). We derive explicit, adaptive convergence rates for the sample estimator under the misspecifed setting, where the target operator is not Hilbert-Schmidt or bounded with respect to the input/output RKHSs. We demonstrate that in certain parameter regimes, we can achieve uniform convergence rates in the output RKHS. We hope our analyses will allow the much broader application of conditional mean embeddings to more complex ML/RL settings involving infinite dimensional RKHSs and continuous state spaces. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/talwai22a.html
  PDF: https://proceedings.mlr.press/v151/talwai22a/talwai22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-talwai22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Prem
    family: Talwai
  - given: Ali
    family: Shameli
  - given: David
    family: Simchi-Levi
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10422-10447
  id: talwai22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10422
  lastpage: 10447
  published: 2022-05-03 00:00:00 +0000
- title: ' Provable Continual Learning via Sketched Jacobian Approximations '
  abstract: ' An important problem in machine learning is the ability to learn tasks in a sequential manner. If trained with standard first-order methods most models forget previously learned tasks when trained on a new task, which is often referred to as catastrophic forgetting. A popular approach to overcome forgetting is to regularize the loss function by penalizing models that perform poorly on previous tasks. For example, elastic weight consolidation (EWC) regularizes with a quadratic form involving a diagonal matrix build based on past data. While EWC works very well for some setups, we show that, even under otherwise ideal conditions, it can provably suffer catastrophic forgetting if the diagonal matrix is a poor approximation of the Hessian matrix of previous tasks. We propose a simple approach to overcome this: Regularizing training of a new task with sketches of the Jacobian matrix of past data. This provably enables overcoming catastrophic forgetting for linear models and for wide neural networks, at the cost of memory. The overarching goal of this paper is to provided insights on when regularization-based continual learning algorithms work and under what memory costs. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/heckel22a.html
  PDF: https://proceedings.mlr.press/v151/heckel22a/heckel22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-heckel22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Reinhard
    family: Heckel
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10448-10470
  id: heckel22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10448
  lastpage: 10470
  published: 2022-05-03 00:00:00 +0000
- title: ' Neural Enhanced Dynamic Message Passing '
  abstract: ' Predicting stochastic spreading processes on complex networks is critical in epidemic control, opinion propagation, and viral marketing. We focus on the problem of inferring the time-dependent marginal probabilities of states for each node which collectively quantifies the spreading results. Dynamic Message Passing (DMP) has been developed as an efficient inference algorithm for several spreading models, and it is asymptotically exact on locally tree-like networks. However, DMP can struggle in diffusion networks with lots of local loops. We address this limitation by using Graph Neural Networks (GNN) to learn the dependency amongst messages implicitly. Specifically, we propose a hybrid model in which the GNN module runs jointly with DMP equations. The GNN module refines the aggregated messages in DMP iterations by learning from simulation data. We demonstrate numerically that after training, our model’s inference accuracy substantially outperforms DMP in conditions of various network structure and dynamics parameters. Moreover, compared to pure data-driven models, the proposed hybrid model has a better generalization ability for out-of-training cases, profiting from the explicitly utilized dynamics priors in the hybrid model. A PyTorch implementation of our model is at https://github.com/FeiGSSS/NEDMP. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/gao22b.html
  PDF: https://proceedings.mlr.press/v151/gao22b/gao22b.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-gao22b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Fei
    family: Gao
  - given: Jiang
    family: Zhang
  - given: Yan
    family: Zhang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10471-10482
  id: gao22b
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10471
  lastpage: 10482
  published: 2022-05-03 00:00:00 +0000
- title: ' Duel-based Deep Learning system for solving IQ tests '
  abstract: ' One of the relevant aspects of Artificial General Intelligence is the ability of machines to demonstrate abstract reasoning skills, for instance, through solving (human) IQ tests. This work presents a new approach to machine IQ tests solving formulated as Raven’s Progressive Matrices (RPMs), called Duel-IQ. The proposed solution incorporates the concept of a tournament in which the best answer is chosen based on a set of duels between candidate RPM answers. The three relevant aspects are: (1) low computational and design complexity, (2) proposition of two schemes of pairing up candidate answers for the duels and (3) evaluation of the system on a dataset of shapes other than those used for training. Depending on a particular variant, the system reaches up to $82.8%$ accuracy on average in RPM tasks with 5 candidate answers and is on par with human performance and superior to other literature approaches of comparable complexity when training and test sets are from the same distribution. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/tomaszewska22a.html
  PDF: https://proceedings.mlr.press/v151/tomaszewska22a/tomaszewska22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-tomaszewska22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Paulina
    family: Tomaszewska
  - given: Adam
    family: Żychowski
  - given: Jacek
    family: Mańdziuk
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10483-10492
  id: tomaszewska22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10483
  lastpage: 10492
  published: 2022-05-03 00:00:00 +0000
- title: ' On a Connection Between Fast and Sparse Oblivious Subspace Embeddings '
  abstract: ' Fast Johnson-Lindenstrauss Transform (FJLT) and Sparse Johnson-Lindenstrauss Transform (SJLT) are two important oblivious subspace embeddings. So far, the developments of these two methods are almost orthogonal. In this work, we propose an iterative algorithm for oblivious subspace embedding which makes a connection between these two methods. The proposed method is built upon an iterative implementation of FJLT and is equipped with several theoretically motivated modifications. One important strategy we adopt is the early stopping strategy. On the one hand, the early stopping strategy makes our algorithm fast. On the other hand, it results in a sparse embedding matrix. As a result, the proposed algorithm is not only faster than the FJLT, but also faster than the SJLT with the same degree of sparsity. We present a general theoretical framework to analyze the embedding property of sparse embedding methods, which is used to prove the embedding property of the proposed method. This framework is also of independent interest. Lastly, we conduct numerical experiments to verify the good performance of the proposed algorithm. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/wang22j.html
  PDF: https://proceedings.mlr.press/v151/wang22j/wang22j.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-wang22j.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Rui
    family: Wang
  - given: Wangli
    family: Xu
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10493-10517
  id: wang22j
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10493
  lastpage: 10517
  published: 2022-05-03 00:00:00 +0000
- title: ' Modeling Conditional Dependencies in Multiagent Trajectories '
  abstract: ' We study modeling joint densities over sets of random variables (next-step movements of multiple agents) which are conditioned on aligned observations (past trajectories). For this setting, we propose an autoregressive approach to model intra-timestep dependencies, where distributions over joint movements are represented by autoregressive factorizations. In our approach, factors are randomly ordered and estimated with a graph neural network to account for permutation equivariance, while a recurrent neural network encodes past trajectories. We further propose a conditional two-stream attention mechanism, to allow for efficient training of random factorizations. We experiment on trajectory data from professional soccer matches and find that we model low frequency trajectories better than variational approaches. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/rudolph22a.html
  PDF: https://proceedings.mlr.press/v151/rudolph22a/rudolph22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-rudolph22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yannick
    family: Rudolph
  - given: Ulf
    family: Brefeld
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10518-10533
  id: rudolph22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10518
  lastpage: 10533
  published: 2022-05-03 00:00:00 +0000
- title: ' Testing Granger Non-Causality in Panels with Cross-Sectional Dependencies '
  abstract: ' This paper proposes a new approach for testing Granger non-causality on panel data. Instead of aggregating panel member statistics, we aggregate their corresponding p-values and show that the resulting p-value approximately bounds the type I error by the chosen significance level even if the panel members are dependent. We compare our approach against the most widely used Granger causality algorithm on panel data and show that our approach yields lower FDR at the same power for large sample sizes and panels with cross sectional dependencies. Finally, we examine COVID-19 data about confirmed cases and deaths measured in countries/regions worldwide and show that our approach is able to discover the true causal relation between confirmed cases and deaths while state-of-the-art approaches fail. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/minorics22a.html
  PDF: https://proceedings.mlr.press/v151/minorics22a/minorics22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-minorics22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Lenon
    family: Minorics
  - given: Caner
    family: Turkmen
  - given: David
    family: Kernert
  - given: Patrick
    family: Bloebaum
  - given: Laurent
    family: Callot
  - given: Dominik
    family: Janzing
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10534-10554
  id: minorics22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10534
  lastpage: 10554
  published: 2022-05-03 00:00:00 +0000
- title: ' Increasing the accuracy and resolution of precipitation forecasts using deep generative models '
  abstract: ' Accurately forecasting extreme rainfall is notoriously difficult, but is also ever more crucial for society as climate change increases the frequency of such extremes. Global numerical weather prediction models often fail to capture extremes, and are produced at too low a resolution to be actionable, while regional, high-resolution models are hugely expensive both in computation and labour. In this paper we explore the use of deep generative models to simultaneously correct and downscale (super-resolve) global ensemble forecasts over the Continental US. Specifically, using fine-grained radar observations as our ground truth, we train a conditional Generative Adversarial Network—coined CorrectorGAN—via a custom training procedure and augmented loss function, to produce ensembles of high-resolution, bias-corrected forecasts based on coarse, global precipitation forecasts in addition to other relevant meteorological fields. Our model outperforms an interpolation baseline, as well as super-resolution-only and CNN-based univariate methods, and approaches the performance of an operational regional high-resolution model across an array of established probabilistic metrics. Crucially, CorrectorGAN, once trained, produces predictions in seconds on a single machine. These results raise exciting questions about the necessity of regional models, and whether data-driven downscaling and correction methods can be transferred to data-poor regions that so far have had no access to high-resolution forecasts. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/price22a.html
  PDF: https://proceedings.mlr.press/v151/price22a/price22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-price22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ilan
    family: Price
  - given: Stephan
    family: Rasp
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10555-10571
  id: price22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10555
  lastpage: 10571
  published: 2022-05-03 00:00:00 +0000
- title: ' Tight bounds for minimum $\ell_1$-norm interpolation of noisy data '
  abstract: ' We provide matching upper and lower bounds of order $\sigma^2/\log(d/n)$ for the prediction error of the minimum $\ell_1$-norm interpolator, a.k.a. basis pursuit. Our result is tight up to negligible terms when $d \gg n$, and is the first to imply asymptotic consistency of noisy minimum-norm interpolation for isotropic features and sparse ground truths. Our work complements the literature on "benign overfitting" for minimum $\ell_2$-norm interpolation, where asymptotic consistency can be achieved only when the features are effectively low-dimensional. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/wang22k.html
  PDF: https://proceedings.mlr.press/v151/wang22k/wang22k.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-wang22k.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Guillaume
    family: Wang
  - given: Konstantin
    family: Donhauser
  - given: Fanny
    family: Yang
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10572-10602
  id: wang22k
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10572
  lastpage: 10602
  published: 2022-05-03 00:00:00 +0000
- title: ' Multivariate Quantile Function Forecaster '
  abstract: ' We propose Multivariate Quantile Function Forecaster (MQF2), a global probabilistic forecasting method constructed using a multivariate quantile function and investigate its application to multi-horizon forecasting. Prior approaches are either autoregressive, implicitly capturing the dependency structure across time but exhibiting error accumulation with increasing forecast horizons, or multi-horizon sequence-to-sequence models, which do not exhibit error accumulation, but also do typically not model the dependency structure across time steps. MQF2 combines the benefits of both approaches, by directly making predictions in the form of a multivariate quantile function, defined as the gradient of a convex function which we parametrize using input-convex neural networks. By design, the quantile function is monotone with respect to the input quantile levels and hence avoids quantile crossing. We provide two options to train MQF2: with energy score or with maximum likelihood. Experimental results on real-world and synthetic datasets show that our model has comparable performance with state-of-the-art methods in terms of single time step metrics while capturing the time dependency structure. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/kan22a.html
  PDF: https://proceedings.mlr.press/v151/kan22a/kan22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-kan22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Kelvin
    family: Kan
  - given: François-Xavier
    family: Aubet
  - given: Tim
    family: Januschowski
  - given: Youngsuk
    family: Park
  - given: Konstantinos
    family: Benidis
  - given: Lars
    family: Ruthotto
  - given: Jan
    family: Gasthaus
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10603-10621
  id: kan22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10603
  lastpage: 10621
  published: 2022-05-03 00:00:00 +0000
- title: ' Coresets for Data Discretization and Sine Wave Fitting '
  abstract: ' In the monitoring problem, the input is an unbounded stream $P={p_1,p_2\cdots}$ of integers in $[N]:=\{1,\cdots,N\}$, that are obtained from a sensor (such as GPS or heart beats of a human). The goal (e.g., for anomaly detection) is to approximate the $n$ points received so far in $P$ by a single frequency $\sin$, e.g. $\min_{c\in C}cost(P,c)+\lambda(c)$, where $cost(P,c)=\sum_{i=1}^n \sin^2(\frac{2\pi}{N} p_ic)$, $C\subseteq [N]$ is a feasible set of solutions, and $\lambda$ is a given regularization function. For any approximation error $\varepsilon>0$, we prove that every set $P$ of $n$ integers has a weighted subset $S\subseteq P$ (sometimes called core-set) of cardinality $|S|\in O(\log(N)^{O(1)})$ that approximates $cost(P,c)$ (for every $c\in [N]$) up to a multiplicative factor of $1\pm\varepsilon$. Using known coreset techniques, this implies streaming algorithms using only $O((\log(N)\log(n))^{O(1)})$ memory. Our results hold for a large family of functions. Experimental results and open source code are provided. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/maalouf22a.html
  PDF: https://proceedings.mlr.press/v151/maalouf22a/maalouf22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-maalouf22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Alaa
    family: Maalouf
  - given: Murad
    family: Tukan
  - given: Eric
    family: Price
  - given: Daniel M.
    family: Kane
  - given: Dan
    family: Feldman
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10622-10639
  id: maalouf22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10622
  lastpage: 10639
  published: 2022-05-03 00:00:00 +0000
- title: ' Non-separable Spatio-temporal Graph Kernels via SPDEs '
  abstract: ' Gaussian processes (GPs) provide a principled and direct approach for inference and learning on graphs. However, the lack of justified graph kernels for spatio-temporal modelling has held back their use in graph problems. We leverage an explicit link between stochastic partial differential equations (SPDEs) and GPs on graphs, introduce a framework for deriving graph kernels via SPDEs, and derive non-separable spatio-temporal graph kernels that capture interaction across space and time. We formulate the graph kernels for the stochastic heat equation and wave equation. We show that by providing novel tools for spatio-temporal GP modelling on graphs, we outperform pre-existing graph kernels in real-world applications that feature diffusion, oscillation, and other complicated interactions. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/nikitin22a.html
  PDF: https://proceedings.mlr.press/v151/nikitin22a/nikitin22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-nikitin22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Alexander V.
    family: Nikitin
  - given: St
    family: John
  - given: Arno
    family: Solin
  - given: Samuel
    family: Kaski
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10640-10660
  id: nikitin22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10640
  lastpage: 10660
  published: 2022-05-03 00:00:00 +0000
- title: ' Towards an Understanding of Default Policies in Multitask Policy Optimization '
  abstract: ' Much of the recent success of deep reinforcement learning has been driven by regularized policy optimization (RPO) algorithms with strong performance across multiple domains. In this family of methods, agents are trained to maximize cumulative reward while penalizing deviation in behavior from some reference, or default policy. In addition to empirical success, there is a strong theoretical foundation for understanding RPO methods applied to single tasks, with connections to natural gradient, trust region, and variational approaches. However, there is limited formal understanding of desirable properties for default policies in the multitask setting, an increasingly important domain as the field shifts towards training more generally capable agents. Here, we take a first step towards filling this gap by formally linking the quality of the default policy to its effect on optimization. Using these results, we then derive a principled RPO algorithm for multitask learning with strong performance guarantees. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/moskovitz22a.html
  PDF: https://proceedings.mlr.press/v151/moskovitz22a/moskovitz22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-moskovitz22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ted
    family: Moskovitz
  - given: Michael
    family: Arbel
  - given: Jack
    family: Parker-Holder
  - given: Aldo
    family: Pacchiano
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10661-10686
  id: moskovitz22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10661
  lastpage: 10686
  published: 2022-05-03 00:00:00 +0000
- title: ' Multiple Importance Sampling ELBO and Deep Ensembles of Variational Approximations '
  abstract: ' In variational inference (VI), the marginal log-likelihood is estimated using the standard evidence lower bound (ELBO), or improved versions as the importance weighted ELBO (IWELBO). We propose the multiple importance sampling ELBO (MISELBO), a versatile yet simple framework. MISELBO is applicable in both amortized and classical VI, and it uses ensembles, e.g., deep ensembles, of independently inferred variational approximations. As far as we are aware, the concept of deep ensembles in amortized VI has not previously been established. We prove that MISELBO provides a tighter bound than the average of standard ELBOs, and demonstrate empirically that it gives tighter bounds than the average of IWELBOs. MISELBO is evaluated in density-estimation experiments that include MNIST and several real-data phylogenetic tree inference problems. First, on the MNIST dataset, MISELBO boosts the density-estimation performances of a state-of-the-art model, nouveau VAE. Second, in the phylogenetic tree inference setting, our framework enhances a state-of-the-art VI algorithm that uses normalizing flows. On top of the technical benefits of MISELBO, it allows to unveil connections between VI and recent advances in the importance sampling literature, paving the way for further methodological advances. We provide our code at https://github.com/Lagergren-Lab/MISELBO. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/kviman22a.html
  PDF: https://proceedings.mlr.press/v151/kviman22a/kviman22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-kviman22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Oskar
    family: Kviman
  - given: Harald
    family: Melin
  - given: Hazal
    family: Koptagel
  - given: Victor
    family: Elvira
  - given: Jens
    family: Lagergren
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10687-10702
  id: kviman22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10687
  lastpage: 10702
  published: 2022-05-03 00:00:00 +0000
- title: ' Can Functional Transfer Methods Capture Simple Inductive Biases? '
  abstract: ' Transferring knowledge embedded in trained neural networks is a core problem in areas like model compression and continual learning. Among knowledge transfer approaches, functional transfer methods such as knowledge distillation and representational distance learning are particularly promising, since they allow for transferring knowledge across different architectures and tasks. Considering various characteristics of networks that are desirable to transfer, equivariance is a notable property that enables a network to capture valuable relationships in the data. We assess existing functional transfer methods on their ability to transfer equivariance and empirically show that they fail to even transfer shift equivariance, one of the simplest equivariances. Further theoretical analysis demonstrates that representational similarity methods, in fact, cannot guarantee the transfer of the intended equivariance. Motivated by these findings, we develop a novel transfer method that learns an equivariance model from a given teacher network and encourages the student network to acquire the same equivariance, via regularization. Experiments show that our method successfully transfers equivariance even in cases where highly restrictive methods, such as directly matching student and teacher representations, fail. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/nix22a.html
  PDF: https://proceedings.mlr.press/v151/nix22a/nix22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-nix22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Arne
    family: Nix
  - given: Suhas
    family: Shrinivasan
  - given: Edgar Y.
    family: Walker
  - given: Fabian
    family: Sinz
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10703-10717
  id: nix22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10703
  lastpage: 10717
  published: 2022-05-03 00:00:00 +0000
- title: ' On the Oracle Complexity of Higher-Order Smooth Non-Convex Finite-Sum Optimization '
  abstract: ' We prove lower bounds for higher-order methods in smooth non-convex finite-sum optimization. Our contribution is threefold: We first show that a deterministic algorithm cannot profit from the finite-sum structure of the objective and that simulating a pth-order regularized method on the whole function by constructing exact gradient information is optimal up to constant factors. We further show lower bounds for randomized algorithms and compare them with the best-known upper bounds. To address some gaps between the bounds, we propose a new second-order smoothness assumption that can be seen as an analogue of the first-order mean-squared smoothness assumption. We prove that it is sufficient to ensure state-of-the-art convergence guarantees while allowing for a sharper lower bound. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/emmenegger22a.html
  PDF: https://proceedings.mlr.press/v151/emmenegger22a/emmenegger22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-emmenegger22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Nicolas
    family: Emmenegger
  - given: Rasmus
    family: Kyng
  - given: Ahad N.
    family: Zehmakan
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10718-10752
  id: emmenegger22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10718
  lastpage: 10752
  published: 2022-05-03 00:00:00 +0000
- title: ' Model-agnostic out-of-distribution detection using combined statistical tests '
  abstract: ' We present simple methods for out-of-distribution detection using a trained generative model. These techniques, based on classical statistical tests, are model-agnostic in the sense that they can be applied to any differentiable generative model. The idea is to combine a classical parametric test (Rao’s score test) with the recently introduced typicality test. These two test statistics are both theoretically well-founded and exploit different sources of information based on the likelihood for the typicality test and its gradient for the score test. We show that combining them using Fisher’s method overall leads to a more accurate out-of-distribution test. We also discuss the benefits of casting out-of-distribution detection as a statistical testing problem, noting in particular that false positive rate control can be valuable for practical out-of-distribution detection. Despite their simplicity and generality, these methods can be competitive with model-specific out-of-distribution detection algorithms without any assumptions on the out-distribution. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/bergamin22a.html
  PDF: https://proceedings.mlr.press/v151/bergamin22a/bergamin22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-bergamin22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Federico
    family: Bergamin
  - given: Pierre-Alexandre
    family: Mattei
  - given: Jakob
    family: Drachmann Havtorn
  - given: Hugo
    family: Sénétaire
  - given: Hugo
    family: Schmutz
  - given: Lars
    family: Maaløe
  - given: Soren
    family: Hauberg
  - given: Jes
    family: Frellsen
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10753-10776
  id: bergamin22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10753
  lastpage: 10776
  published: 2022-05-03 00:00:00 +0000
- title: ' Generalized Group Testing '
  abstract: ' In the problem of classical group testing one aims to identify a small subset (of size $d$) diseased individuals/defective items in a large population (of size $n$) via a minimal number of suitably-designed group tests on subsets of items, where the test outcome is positive iff the given test contains at least one defective item. Motivated by physical considerations, we consider a generalized setting that includes as special cases multiple other group-testing-like models in the literature. In our setting, which subsumes as special cases a variety of noiseless and noisy group-testing models in the literature, the test outcome is positive with probability $f(x)$, where $x$ is the number of defectives tested in a pool, and $f(\cdot)$ is an arbitrary {\it monotonically increasing} (stochastic) test function. Our main contributions are as follows. 1. We present a non-adaptive scheme that with probability $1-\varepsilon$ identifies all defective items. Our scheme requires at most ${\cal O}( H(f) d\log(n/\varepsilon))$ tests, where $H(f)$ is a suitably defined “sensitivity parameter" of $f(\cdot)$, and is never larger than ${\cal O}(d^{1+o(1)})$, but may be substantially smaller for many $f(\cdot)$. 2. We argue that any non-adaptive group testing scheme needs at least $\Omega (h(f) d\log(n/d))$ tests to ensure high reliability recovery. Here $h(f)$ is a suitably defined “concentration parameter" of $f(\cdot)$, and $h(f) \in \Omega{(1)}$. 3. We prove that our sample-complexity bounds for generalized group testing are information-theoretically near-optimal for a variety of sparse-recovery group-testing models in the literature. That is, for {\it any} “noisy" test function $f(\cdot)$ (i.e. $0< f(0) < f(d) <1$), and for a variety of “(one-sided) noiseless" test functions $f(\cdot)$ (i.e., either $f(0)=0$, or $f(d)=1$, or both) studied in the literature we show that $H(f)/h(f) \in \Theta(1)$. As a by-product we tightly characterize the heretofore open information-theoretic sample-complexity for the well-studied model of threshold group-testing. For general (near)-noiseless test functions $f(\cdot)$ we show that $H(f)/h(f) \in {\cal O}(d^{1+o(1)})$. We also demonstrate a “natural" test-function $f(\cdot)$ whose sample complexity scales “extremally" as $\Theta ( d^2\log(n))$, rather than $\Theta ( d\log(n))$ as in the case of classical group-testing. Some of our techniques may be of independent interest – in particular our achievability requires a delicate saddle-point approximation, and our impossibility proof relies on a novel bound relating the mutual information of pair of random variables with the mean and variance of a specific function, and we derive novel structural results about monotone functions. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/cheng22a.html
  PDF: https://proceedings.mlr.press/v151/cheng22a/cheng22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-cheng22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Xiwei
    family: Cheng
  - given: Sidharth
    family: Jaggi
  - given: Qiaoqiao
    family: Zhou
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10777-10835
  id: cheng22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10777
  lastpage: 10835
  published: 2022-05-03 00:00:00 +0000
- title: ' Adaptive A/B Test on Networks with Cluster Structures '
  abstract: ' Units in online A/B tests are often involved in social networks. Thus, their outcomes may depend on the treatment of their neighbors. Many of such networks exhibit certain cluster structures allowing the use of these features in the design to reduce the bias from network interference. When the average treatment effect (ATE) is considered from the individual perspective, conditions for the valid estimation restrict the use of these features in the design. We show that such restrictions can be alleviated if the ATE from the cluster perspective is considered. Using an illustrative example, we further show that the weights employed by the Horvitz-Thompson estimator may not appropriately accommodate the network structure, and purely relying on graph-cluster randomization may generate very unbalanced cluster-treated structures across the treatment arms. The measures of such structures for one cluster may depend on the treatment of other clusters and pose a great challenge for the design of A/B tests. To address these issues, we propose a rerandomized-adaptive randomization to balance the clusters and a cluster-adjusted estimator to alleviate the problem of the weights. Numerical studies are conducted to demonstrate the usage of the proposed procedure. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/liu22g.html
  PDF: https://proceedings.mlr.press/v151/liu22g/liu22g.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-liu22g.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yang
    family: Liu
  - given: Yifan
    family: Zhou
  - given: Ping
    family: Li
  - given: Feifang
    family: Hu
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10836-10851
  id: liu22g
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10836
  lastpage: 10851
  published: 2022-05-03 00:00:00 +0000
- title: ' Spectral Robustness for Correlation Clustering Reconstruction in Semi-Adversarial Models '
  abstract: ' Correlation Clustering is an important clustering problem with many applications. We study the reconstruction version of this problem, in which one seeks to reconstruct a latent clustering that has been corrupted by random noise and adversarial modifications. Concerning the latter, there is a standard "post-adversarial" model in the literature, in which adversarial modifications come after the noise. Here, we introduce and analyse a "pre-adversarial" model, in which adversarial modifications come before the noise. Given an input coming from such a semi-adversarial generative model, the goal is to approximately reconstruct with high probability the latent clustering. We focus on the case where the hidden clusters have nearly equal size and show the following. In the pre-adversarial setting, spectral algorithms are optimal, in the sense that they reconstruct all the way to the information-theoretic threshold beyond which no reconstruction is possible. This is in contrast to the post-adversarial setting, in which their ability to restore the hidden clusters stops before the threshold, but the gap is optimally filled by SDP-based algorithms. These results highlight a heretofore unknown robustness of spectral algorithms, showing them less brittle than previously thought. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/chierichetti22a.html
  PDF: https://proceedings.mlr.press/v151/chierichetti22a/chierichetti22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-chierichetti22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Flavio
    family: Chierichetti
  - given: Alessandro
    family: Panconesi
  - given: Giuseppe
    family: Re
  - given: Luca
    family: Trevisan
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10852-10880
  id: chierichetti22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10852
  lastpage: 10880
  published: 2022-05-03 00:00:00 +0000
- title: ' Beyond Data Samples: Aligning Differential Networks Estimation with Scientific Knowledge '
  abstract: ' Learning the differential statistical dependency network between two contexts is essential for many real-life applications, mostly in the high dimensional low sample regime. In this paper, we propose a novel differential network estimator that allows integrating various sources of knowledge beyond data samples. The proposed estimator is scalable to a large number of variables and achieves a sharp asymptotic convergence rate. Empirical experiments on extensive simulated data and four real-world applications (one on neuroimaging and three from functional genomics) show that our approach achieves improved differential network estimation and provides better supports to downstream tasks like classification. Our results highlight significant benefits of integrating group, spatial and anatomic knowledge during differential genetic network identification and brain connectome change discovery. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/sekhon22a.html
  PDF: https://proceedings.mlr.press/v151/sekhon22a/sekhon22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-sekhon22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Arshdeep
    family: Sekhon
  - given: Zhe
    family: Wang
  - given: Yanjun
    family: Qi
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10881-10923
  id: sekhon22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10881
  lastpage: 10923
  published: 2022-05-03 00:00:00 +0000
- title: ' Two-way Sparse Network Inference for Count Data '
  abstract: ' Classically, statistical datasets have a larger number of data points than features ($n > p$). The standard model of classical statistics caters for the case where data points are considered conditionally independent given the parameters. However, for $n \approx p$ or $p > n$ such models are poorly determined. Kalaitzis et al. (2013) introduced the Bigraphical Lasso, an estimator for sparse precision matrices based on the Cartesian product of graphs. Unfortunately, the original Bigraphical Lasso algorithm is not applicable in case of large $p$ and $n$ due to memory requirements. We exploit eigenvalue decomposition of the Cartesian product graph to present a more efficient version of the algorithm which reduces memory requirements from $O(n^2p^2)$ to $O(n^2 +p^2)$. Many datasets in different application fields, such as biology, medicine and social science, come with count data, for which Gaussian based models are not applicable. Our multiway network inference approach can be used for discrete data. Our methodology accounts for the dependencies across both instances and features, reduces the computational complexity for high dimensional data and enables to deal with both discrete and continuous data. Numerical studies on both synthetic and real datasets are presented to showcase the performance of our method. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/li22g.html
  PDF: https://proceedings.mlr.press/v151/li22g/li22g.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-li22g.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Sijia
    family: Li
  - given: Martı́n
    family: López-Garcı́a
  - given: Neil D.
    family: Lawrence
  - given: Luisa
    family: Cutillo
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10924-10938
  id: li22g
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10924
  lastpage: 10938
  published: 2022-05-03 00:00:00 +0000
- title: ' Rapid Convergence of Informed Importance Tempering '
  abstract: ' Informed Markov chain Monte Carlo (MCMC) methods have been proposed as scalable solutions to Bayesian posterior computation on high-dimensional discrete state spaces, but theoretical results about their convergence behavior in general settings are lacking. In this article, we propose a class of MCMC schemes called informed importance tempering (IIT), which combine importance sampling and informed local proposals, and derive generally applicable spectral gap bounds for IIT estimators. Our theory shows that IIT samplers have remarkable scalability when the target posterior distribution concentrates on a small set. Further, both our theory and numerical experiments demonstrate that the informed proposal should be chosen with caution: the performance may be very sensitive to the shape of the target distribution. We find that the “square-root proposal weighting” scheme tends to perform well in most settings. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/zhou22e.html
  PDF: https://proceedings.mlr.press/v151/zhou22e/zhou22e.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-zhou22e.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Quan
    family: Zhou
  - given: Aaron
    family: Smith
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10939-10965
  id: zhou22e
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10939
  lastpage: 10965
  published: 2022-05-03 00:00:00 +0000
- title: ' On the Implicit Bias of Gradient Descent for Temporal Extrapolation '
  abstract: ' When using recurrent neural networks (RNNs) it is common practice to apply trained models to sequences longer than those seen in training. This “extrapolating” usage deviates from the traditional statistical learning setup where guarantees are provided under the assumption that train and test distributions are identical. Here we set out to understand when RNNs can extrapolate, focusing on a simple case where the data generating distribution is memoryless. We first show that even with infinite training data, there exist RNN models that interpolate perfectly (i.e., they fit the training data) yet extrapolate poorly to longer sequences. We then show that if gradient descent is used for training, learning will converge to perfect extrapolation under certain assumptions on initialization. Our results complement recent studies on the implicit bias of gradient descent, showing that it plays a key role in extrapolation when learning temporal prediction models. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/cohen-karlik22a.html
  PDF: https://proceedings.mlr.press/v151/cohen-karlik22a/cohen-karlik22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-cohen-karlik22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Edo
    family: Cohen-Karlik
  - given: Avichai
    family: Ben David
  - given: Nadav
    family: Cohen
  - given: Amir
    family: Globerson
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10966-10981
  id: cohen-karlik22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10966
  lastpage: 10981
  published: 2022-05-03 00:00:00 +0000
- title: ' Differentiable Bayesian inference of SDE parameters using a pathwise series expansion of Brownian motion '
  abstract: ' By invoking a pathwise series expansion of Brownian motion, we propose to approximate a stochastic differential equation (SDE) with an ordinary differential equation (ODE). This allows us to reformulate Bayesian inference for a SDE as the parameter estimation task for an ODE. Unlike a nonlinear SDE, the likelihood for an ODE model is tractable and its gradient can be obtained using adjoint sensitivity analysis. This reformulation allows us to use an efficient sampler, such as NUTS, that rely on the gradient of the log posterior. Applying the reparameterisation trick, variational inference can also be used for the same estimation task. We illustrate the proposed method on a variety of SDE models. We obtain similar parameter estimates when compared to data augmentation techniques. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ghosh22a.html
  PDF: https://proceedings.mlr.press/v151/ghosh22a/ghosh22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ghosh22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Sanmitra
    family: Ghosh
  - given: Paul J.
    family: Birrell
  - given: Daniela
    family: De Angelis
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10982-10998
  id: ghosh22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10982
  lastpage: 10998
  published: 2022-05-03 00:00:00 +0000
- title: ' Quadric Hypersurface Intersection for Manifold Learning in Feature Space '
  abstract: ' The knowledge that data lies close to a particular submanifold of the ambient Euclidean space may be useful in a number of ways. For instance, one may want to automatically mark any point far away from the submanifold as an outlier or to use the geometry to come up with a better distance metric. Manifold learning problems are often posed in a very high dimension, e.g. for spaces of images or spaces of words. Today, with deep representation learning on the rise in areas such as computer vision and natural language processing, many problems of this kind may be transformed into problems of moderately high dimension, typically of the order of hundreds. Motivated by this, we propose a manifold learning technique suitable for moderately high dimension and large datasets. The manifold is learned from the training data in the form of an intersection of quadric hypersurfaces—simple but expressive objects. At test time, this manifold can be used to introduce a computationally efficient outlier score for arbitrary new data points and to improve a given similarity metric by incorporating the learned geometric structure into it. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/pavutnitskiy22a.html
  PDF: https://proceedings.mlr.press/v151/pavutnitskiy22a/pavutnitskiy22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-pavutnitskiy22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Fedor
    family: Pavutnitskiy
  - given: Sergei O.
    family: Ivanov
  - given: Evgeniy
    family: Abramov
  - given: Viacheslav
    family: Borovitskiy
  - given: Artem
    family: Klochkov
  - given: Viktor
    family: Vyalov
  - given: Anatolii
    family: Zaikovskii
  - given: Aleksandr
    family: Petiushko
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 10999-11013
  id: pavutnitskiy22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 10999
  lastpage: 11013
  published: 2022-05-03 00:00:00 +0000
- title: ' Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise Comparisons '
  abstract: ' In heterogeneous rank aggregation problems, users often exhibit various accuracy levels when comparing pairs of items. Thus, a uniform querying strategy over users may not be optimal. To address this issue, we propose an elimination-based active sampling strategy, which estimates the ranking of items via noisy pairwise comparisons from multiple users and improves the users’ average accuracy by maintaining an active set of users. We prove that our algorithm can return the true ranking of items with high probability. We also provide a sample complexity bound for the proposed algorithm, which outperforms the non-active strategies in the literature and close to oracle under mild conditions. Experiments are provided to show the empirical advantage of the proposed methods over the state-of-the-art baselines. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/wu22f.html
  PDF: https://proceedings.mlr.press/v151/wu22f/wu22f.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-wu22f.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yue
    family: Wu
  - given: Tao
    family: Jin
  - given: Hao
    family: Lou
  - given: Pan
    family: Xu
  - given: Farzad
    family: Farnoud
  - given: Quanquan
    family: Gu
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11014-11036
  id: wu22f
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11014
  lastpage: 11036
  published: 2022-05-03 00:00:00 +0000
- title: ' Outcome Assumptions and Duality Theory for Balancing Weights '
  abstract: ' We study balancing weight estimators, which reweight outcomes from a source population to estimate missing outcomes in a target population. These estimators minimize the worst-case error by making an assumption about the outcome model. In this paper, we show that this outcome assumption has two immediate implications. First, we can replace the minimax optimization problem for balancing weights with a simple convex loss over the assumed outcome function class. Second, we can replace the commonly-made overlap assumption with a more appropriate quantitative measure, the minimum worst-case bias. Finally, we show conditions under which the weights remain robust when our assumptions on the outcomes are wrong. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/bruns-smith22a.html
  PDF: https://proceedings.mlr.press/v151/bruns-smith22a/bruns-smith22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-bruns-smith22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: David A.
    family: Bruns-Smith
  - given: Avi
    family: Feller
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11037-11055
  id: bruns-smith22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11037
  lastpage: 11055
  published: 2022-05-03 00:00:00 +0000
- title: ' Predicting the utility of search spaces for black-box optimization: a simple, budget-aware approach '
  abstract: ' Black box optimization requires specifying a search space to explore for solutions, e.g. a d-dimensional compact space, and this choice is critical for getting the best results at a reasonable budget. Unfortunately, determining a high quality search space can be challenging in many applications. For example, when tuning hyperparameters for machine learning pipelines on a new problem given a limited budget, one must strike a balance between excluding potentially promising regions and keeping the search space small enough to be tractable. The goal of this work is to motivate—through example applications in tuning deep neural networks—the problem of predicting the quality of search spaces conditioned on budgets, as well as to provide a simple scoring method based on a utility function applied to a probabilistic response surface model, similar to Bayesian optimization. We show that the method we present can compute meaningful budget-conditional scores in a variety of situations. We also provide experimental evidence that accurate scores can be useful in constructing and pruning search spaces. Ultimately, we believe scoring search spaces should become standard practice in the experimental workflow for deep learning. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ariafar22a.html
  PDF: https://proceedings.mlr.press/v151/ariafar22a/ariafar22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ariafar22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Setareh
    family: Ariafar
  - given: Justin
    family: Gilmer
  - given: Zachary
    family: Nado
  - given: Jasper
    family: Snoek
  - given: Rodolphe
    family: Jenatton
  - given: George
    family: Dahl
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11056-11071
  id: ariafar22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11056
  lastpage: 11071
  published: 2022-05-03 00:00:00 +0000
- title: ' Learning from Multiple Noisy Partial Labelers '
  abstract: ' Programmatic weak supervision creates models without hand-labeled training data by combining the outputs of heuristic labelers. Existing frameworks make the restrictive assumption that labelers output a single class label. Enabling users to create partial labelers that output subsets of possible class labels would greatly expand the expressivity of programmatic weak supervision. We introduce this capability by defining a probabilistic generative model that can estimate the underlying accuracies of multiple noisy partial labelers without ground truth labels. We show how to scale up learning, for example learning on 100k examples in one minute, a 300$\times$ speed up compared to a naive implementation. We also prove that this class of models is generically identifiable up to label swapping under mild conditions. We evaluate our framework on three text classification and six object classification tasks. On text tasks, adding partial labels increases average accuracy by 8.6 percentage points. On image tasks, we show that partial labels allow us to approach some zero-shot object classification problems with programmatic weak supervision by using class attributes as partial labelers. On these tasks, our framework has accuracy comparable to recent embedding-based zero-shot learning methods, while using only pre-trained attribute detectors. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/yu22c.html
  PDF: https://proceedings.mlr.press/v151/yu22c/yu22c.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-yu22c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Peilin
    family: Yu
  - given: Tiffany
    family: Ding
  - given: Stephen H.
    family: Bach
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11072-11095
  id: yu22c
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11072
  lastpage: 11095
  published: 2022-05-03 00:00:00 +0000
- title: ' Forward Looking Best-Response Multiplicative Weights Update Methods for Bilinear Zero-sum Games '
  abstract: ' Our work focuses on extra gradient learning algorithms for finding Nash equilibria in bilinear zero-sum games. The proposed method, which can be formally considered as a variant of Optimistic Mirror Descent (Mertikopoulos et al., 2019), uses a large learning rate for the intermediate gradient step which essentially leads to computing (approximate) best response strategies against the profile of the previous iteration. Although counter-intuitive at first sight due to the irrationally large, for an iterative algorithm, intermediate learning step, we prove that the method guarantees last-iterate convergence to an equilibrium. Particularly, we show that the algorithm reaches first an $\eta^{1/\rho}$-approximate Nash equilibrium, with $\rho > 1$, by decreasing the Kullback-Leibler divergence of each iterate by at least $\Omega(\eta^{1+\frac{1}{\rho}})$, for sufficiently small learning rate $\eta$, until the method becomes a contracting map, and converges to the exact equilibrium. Furthermore, we perform experimental comparisons with the optimistic variant of the multiplicative weights update method, by Daskalakis and Panageas (2019) and show that our algorithm has significant practical potential since it offers substantial gains in terms of accelerated convergence. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/fasoulakis22a.html
  PDF: https://proceedings.mlr.press/v151/fasoulakis22a/fasoulakis22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-fasoulakis22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Michail
    family: Fasoulakis
  - given: Evangelos
    family: Markakis
  - given: Yannis
    family: Pantazis
  - given: Constantinos
    family: Varsos
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11096-11117
  id: fasoulakis22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11096
  lastpage: 11117
  published: 2022-05-03 00:00:00 +0000
- title: ' Hypergraph Simultaneous Generators '
  abstract: ' Generative models for affiliation networks condition the edges on the membership of their nodes to communities. The problem of community detection under these models is addressed by inferring the membership parameters from the network structure. Current models make several unrealistic assumptions to make the inference feasible, and are mostly designed to work on regular graphs that cannot handle multi-way connections between nodes. While the models designed for hypergraphs attempt to capture the latter, they add further strict assumptions on the structure and size of hyperedges and are usually computationally intractable for real data. This paper proposes an efficient probabilistic generative model for detecting overlapping communities that process hyperedges without any changes or restrictions on their size. Our model represents the entire state space of the hyperedges, which is exponential in the number of nodes. We develop a mathematical computation reduction scheme that reduces the inference time to linear in the volume of the hypergraph without sacrificing precision. Our experimental results validate the effectiveness and scalability of our model and demonstrate the superiority of our approach over state-of-the-art community detection methods. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/pedrood22a.html
  PDF: https://proceedings.mlr.press/v151/pedrood22a/pedrood22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-pedrood22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Bahman
    family: Pedrood
  - given: Carlotta
    family: Domeniconi
  - given: Kathryn
    family: Laskey
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11118-11130
  id: pedrood22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11118
  lastpage: 11130
  published: 2022-05-03 00:00:00 +0000
- title: ' Amortised Likelihood-free Inference for Expensive Time-series Simulators with Signatured Ratio Estimation '
  abstract: ' Simulation models of complex dynamics in the natural and social sciences commonly lack a tractable likelihood function, rendering traditional likelihood-based statistical inference impossible. Recent advances in machine learning have introduced novel algorithms for estimating otherwise intractable likelihood functions using a likelihood ratio trick based on binary classifiers. Consequently, efficient likelihood approximations can be obtained whenever good probabilistic classifiers can be constructed. We propose a kernel classifier for sequential data using <em>path signatures</em> based on the recently introduced signature kernel. We demonstrate that the representative power of signatures yields a highly performant classifier, even in the crucially important case where sample numbers are low. In such scenarios, our approach can outperform sophisticated neural networks for common posterior inference tasks. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/dyer22a.html
  PDF: https://proceedings.mlr.press/v151/dyer22a/dyer22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-dyer22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Joel
    family: Dyer
  - given: Patrick W.
    family: Cannon
  - given: Sebastian M.
    family: Schmon
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11131-11144
  id: dyer22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11131
  lastpage: 11144
  published: 2022-05-03 00:00:00 +0000
- title: ' Robust Training in High Dimensions via Block Coordinate Geometric Median Descent '
  abstract: ' Geometric median (GM) is a classical method in statistics for achieving robust estimation of the uncorrupted data; under gross corruption, it achieves the optimal breakdown point of 1/2. However, its computational complexity makes it infeasible for robustifying stochastic gradient descent (SGD) in high-dimensional optimization problems. In this paper, we show that by applying GM to only a judiciously chosen block of coordinates at a time and using a memory mechanism, one can retain the breakdown point of 1/2 for smooth non-convex problems, with non-asymptotic convergence rates comparable to the SGD with GM while resulting in significant speedup in training. We further validate the run-time and robustness of our approach empirically on several popular deep learning tasks. Code available at: https://github.com/anishacharya/BGMD '
  volume: 151
  URL: https://proceedings.mlr.press/v151/acharya22a.html
  PDF: https://proceedings.mlr.press/v151/acharya22a/acharya22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-acharya22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Anish
    family: Acharya
  - given: Abolfazl
    family: Hashemi
  - given: Prateek
    family: Jain
  - given: Sujay
    family: Sanghavi
  - given: Inderjit S.
    family: Dhillon
  - given: Ufuk
    family: Topcu
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11145-11168
  id: acharya22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11145
  lastpage: 11168
  published: 2022-05-03 00:00:00 +0000
- title: ' Stateful Offline Contextual Policy Evaluation and Learning '
  abstract: ' We study off-policy evaluation and learning from sequential data in a structured class of Markov decision processes that arise from repeated interactions with an exogenous sequence of arrivals with contexts, which generate unknown individual-level responses to agent actions. This model can be thought of as an offline generalization of contextual bandits with resource constraints. We formalize the relevant causal structure of problems such as dynamic personalized pricing and other operations management problems in the presence of potentially high-dimensional user types. The key insight is that an individual-level response is often not causally affected by the state variable and can therefore easily be generalized across timesteps and states. When this is true, we study implications for (doubly robust) off-policy evaluation and learning by instead leveraging single time-step evaluation, estimating the expectation over a single arrival via data from a population, for fitted-value iteration in a marginal MDP. We study sample complexity and analyze error amplification that leads to the persistence, rather than attenuation, of confounding error over time. In simulations of dynamic and capacitated pricing, we show improved out-of-sample policy performance in this class of relevant problems. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/kallus22a.html
  PDF: https://proceedings.mlr.press/v151/kallus22a/kallus22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-kallus22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Nathan
    family: Kallus
  - given: Angela
    family: Zhou
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11169-11194
  id: kallus22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11169
  lastpage: 11194
  published: 2022-05-03 00:00:00 +0000
- title: ' Sample Complexity of Policy-Based Methods under Off-Policy Sampling and Linear Function Approximation '
  abstract: ' In this work, we study policy-based methods for solving the reinforcement learning problem, where off-policy sampling and linear function approximation are employed for policy evaluation, and various policy update rules (including natural policy gradient) are considered for policy improvement. To solve the policy evaluation sub-problem in the presence of the deadly triad, we propose a generic algorithm framework of multi-step TD-learning with generalized importance sampling ratios, which includes two specific algorithms: the $\lambda$-averaged $Q$-trace and the two-sided $Q$-trace. The generic algorithm is single time-scale, has provable finite-sample guarantees, and overcomes the high variance issue in off-policy learning. As for the policy improvement, we provide a universal analysis that establishes geometric convergence of various policy update rules, which leads to an overall $\Tilde{\mathcal{O}}(\epsilon^{-2})$ sample complexity. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/chen22i.html
  PDF: https://proceedings.mlr.press/v151/chen22i/chen22i.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-chen22i.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Zaiwei
    family: Chen
  - given: Siva
    family: Theja Maguluri
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11195-11214
  id: chen22i
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11195
  lastpage: 11214
  published: 2022-05-03 00:00:00 +0000
- title: ' Solving Multi-Arm Bandit Using a Few Bits of Communication '
  abstract: ' The multi-armed bandit (MAB) problem is an active learning framework that aims to select the best among a set of actions by sequentially observing rewards. Recently, it has become popular for a number of applications over wireless networks, where communication constraints can form a bottleneck. Existing works usually fail to address this issue and can become infeasible in certain applications. In this paper we address the communication problem by optimizing the communication of rewards collected by distributed agents. By providing nearly matching upper and lower bounds, we tightly characterize the number of bits needed per reward for the learner to accurately learn without suffering additional regret. In particular, we establish a generic reward quantization algorithm, QuBan, that can be applied on top of any (no-regret) MAB algorithm to form a new communication-efficient counterpart, that requires only a few (as low as 3) bits to be sent per iteration while preserving the same regret bound. Our lower bound is established via constructing hard instances from a subgaussian distribution. Our theory is further corroborated by numerically experiments. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/hanna22a.html
  PDF: https://proceedings.mlr.press/v151/hanna22a/hanna22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-hanna22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Osama A.
    family: Hanna
  - given: Lin
    family: Yang
  - given: Christina
    family: Fragouli
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11215-11236
  id: hanna22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11215
  lastpage: 11236
  published: 2022-05-03 00:00:00 +0000
- title: ' Causal Effect Identification with Context-specific Independence Relations of Control Variables '
  abstract: ' We study the problem of causal effect identification from observational distribution given the causal graph and some context-specific independence (CSI) relations. It was recently shown that this problem is NP-hard, and while a sound algorithm to learn the causal effects is proposed in Tikka et al. (2019), no complete algorithm for the task exists. In this work, we propose a sound and complete algorithm for the setting when the CSI relations are limited to observed nodes with no parents in the causal graph. One limitation of the state of the art in terms of its applicability is that the CSI relations among all variables, even unobserved ones, must be given (as opposed to learned). Instead, We introduce a set of graphical constraints under which the CSI relations can be learned from mere observational distribution. This expands the set of identifiable causal effects beyond the state of the art. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/mokhtarian22a.html
  PDF: https://proceedings.mlr.press/v151/mokhtarian22a/mokhtarian22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-mokhtarian22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ehsan
    family: Mokhtarian
  - given: Fateme
    family: Jamshidi
  - given: Jalal
    family: Etesami
  - given: Negar
    family: Kiyavash
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11237-11246
  id: mokhtarian22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11237
  lastpage: 11246
  published: 2022-05-03 00:00:00 +0000
- title: ' Entropy Regularized Optimal Transport Independence Criterion '
  abstract: ' We introduce an independence criterion based on entropy regularized optimal transport. Our criterion can be used to test for independence between two samples. We establish non-asymptotic bounds for our test statistic and study its statistical behavior under both the null hypothesis and the alternative hypothesis. The theoretical results involve tools from U-process theory and optimal transport theory. We also offer a random feature type approximation for large-scale problems, as well as a differentiable program implementation for deep learning applications. We present experimental results on existing benchmarks for independence testing, illustrating the interest of the proposed criterion to capture both linear and nonlinear dependencies in synthetic data and real data. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/liu22h.html
  PDF: https://proceedings.mlr.press/v151/liu22h/liu22h.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-liu22h.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Lang
    family: Liu
  - given: Soumik
    family: Pal
  - given: Zaid
    family: Harchaoui
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11247-11279
  id: liu22h
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11247
  lastpage: 11279
  published: 2022-05-03 00:00:00 +0000
- title: ' Polynomial Time Reinforcement Learning in Factored State MDPs with Linear Value Functions '
  abstract: ' Many reinforcement learning (RL) environments in practice feature enormous state spaces that may be described compactly by a "factored" structure, that may be modeled by Factored Markov Decision Processes (FMDPs). We present the first polynomial time algorithm for RL in Factored State MDPs (generalizing FMDPs) that neither relies on an oracle planner nor requires a linear transition model; it only requires a linear value function with a suitable local basis with respect to the factorization, permitting efficient variable elimination. With this assumption, we can solve this family of Factored State MDPs in polynomial time by constructing an efficient separation oracle for convex optimization. Importantly, and in contrast to prior work on FMDPs, we do not assume that the transitions on various factors are conditionally independent. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/deng22c.html
  PDF: https://proceedings.mlr.press/v151/deng22c/deng22c.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-deng22c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Zihao
    family: Deng
  - given: Siddartha
    family: Devic
  - given: Brendan
    family: Juba
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11280-11304
  id: deng22c
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11280
  lastpage: 11304
  published: 2022-05-03 00:00:00 +0000
- title: ' PAC Learning of Quantum Measurement Classes : Sample Complexity Bounds and Universal Consistency '
  abstract: ' We formulate a quantum analogue of the fundamental classical PAC learning problem. As on a quantum computer, we model data to be encoded by modifying specific attributes - spin axis of an electron, plane of polarization of a photon - of sub-atomic particles. Any interaction, including reading off, extracting or learning from such data is via quantum measurements, thus leading us to a problem of PAC learning Quantum Measurement Classes. We propose and analyze the sample complexity of a new ERM algorithm that respects quantum non-commutativity. Our study entails that we define the VC dimension of Positive Operator Valued Measure(ments) (POVMs) concept classes. Our sample complexity bounds involve optimizing over partitions of jointly measurable classes. Finally, we identify universally consistent sequences of POVM classes. Technical components of this work include computations involving tensor products, trace and uniform convergence bounds. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/padakandla22a.html
  PDF: https://proceedings.mlr.press/v151/padakandla22a/padakandla22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-padakandla22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Arun
    family: Padakandla
  - given: Abram
    family: Magner
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11305-11319
  id: padakandla22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11305
  lastpage: 11319
  published: 2022-05-03 00:00:00 +0000
- title: ' Can we Generalize and Distribute Private Representation Learning? '
  abstract: ' We study the problem of learning representations that are private yet informative i.e., provide information about intended "ally" targets while hiding sensitive "adversary" attributes. We propose Exclusion-Inclusion Generative Adversarial Network (EIGAN), a generalized private representation learning (PRL) architecture that accounts for multiple ally and adversary attributes unlike existing PRL solutions. While centrally-aggregated dataset is a prerequisite for most PRL techniques, data in real-world is often siloed across multiple distributed nodes unwilling to share the raw data because of privacy concerns. We address this practical constraint by developing D-EIGAN, the first distributed PRL method that learns representations at each node without transmitting the source data. We theoretically analyze the behavior of adversaries under the optimal EIGAN and D-EIGAN encoders and the impact of dependencies among ally and adversary tasks on the optimization objective. Our experiments on various datasets demonstrate the advantages of EIGAN in terms of performance, robustness, and scalability. In particular, EIGAN outperforms the previous state-of-the-art by a significant accuracy margin ($47%$ improvement), and D-EIGAN’s performance is consistently on par with EIGAN under different network settings. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/shams-azam22a.html
  PDF: https://proceedings.mlr.press/v151/shams-azam22a/shams-azam22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-shams-azam22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Sheikh
    family: Shams Azam
  - given: Taejin
    family: Kim
  - given: Seyyedali
    family: Hosseinalipour
  - given: Carlee
    family: Joe-Wong
  - given: Saurabh
    family: Bagchi
  - given: Christopher
    family: Brinton
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11320-11340
  id: shams-azam22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11320
  lastpage: 11340
  published: 2022-05-03 00:00:00 +0000
- title: ' Feature Collapsing for Gaussian Process Variable Ranking '
  abstract: ' At present, there is no consensus on the most effective way to establish feature relevance for Gaussian process models. The most common heuristic, Automatic Relevance Determination, has several downsides; many alternate methods incur unacceptable computational costs. Existing methods based on sensitivity analysis of the posterior predictive distribution are promising, but are heavily biased and show room for improvement. This paper proposes Feature Collapsing as a novel method for performing GP feature relevance determination in an effective, unbiased, and computationally-inexpensive manner compared to existing algorithms. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/sebenius22a.html
  PDF: https://proceedings.mlr.press/v151/sebenius22a/sebenius22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-sebenius22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Isaac
    family: Sebenius
  - given: Topi
    family: Paananen
  - given: Aki
    family: Vehtari
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11341-11355
  id: sebenius22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11341
  lastpage: 11355
  published: 2022-05-03 00:00:00 +0000
- title: ' Private Sequential Hypothesis Testing for Statisticians: Privacy, Error Rates, and Sample Size '
  abstract: ' The sequential hypothesis testing problem is a class of statistical analyses where the sample size is not fixed in advance. Instead, the decision-process takes in new observations sequentially to make real-time decisions for testing an alternative hypothesis against a null hypothesis until some stopping criterion is satisfied. In many common applications of sequential hypothesis testing, the data can be highly sensitive and may require privacy protection; for example, sequential hypothesis testing is used in clinical trials, where doctors sequentially collect data from patients and must determine when to stop recruiting patients and whether the treatment is effective. The field of differential privacy has been developed to offer data analysis tools with strong privacy guarantees, and has been commonly applied to machine learning and statistical tasks. In this work, we study the sequential hypothesis testing problem under a slight variant of differential privacy, known as Renyi differential privacy. We present a new private algorithm based on Wald’s Sequential Probability Ratio Test (SPRT) that also gives strong theoretical privacy guarantees. We provide theoretical analysis on statistical performance measured by Type I and Type II error as well as the expected sample size. We also empirically validate our theoretical results on several synthetic databases, showing that our algorithms also perform well in practice. Unlike previous work in private hypothesis testing that focused only on the classical fixed sample setting, our results in the sequential setting allow a conclusion to be reached much earlier, and thus saving the cost of collecting additional samples. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/zhang22g.html
  PDF: https://proceedings.mlr.press/v151/zhang22g/zhang22g.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-zhang22g.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Wanrong
    family: Zhang
  - given: Yajun
    family: Mei
  - given: Rachel
    family: Cummings
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11356-11373
  id: zhang22g
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11356
  lastpage: 11373
  published: 2022-05-03 00:00:00 +0000
- title: ' FLIX: A Simple and Communication-Efficient Alternative to Local Methods in Federated Learning '
  abstract: ' Federated Learning (FL) is an increasingly popular machine learning paradigm in which multiple nodes try to collaboratively learn under privacy, communication and multiple heterogeneity constraints. A persistent problem in federated learning is that it is not clear what the optimization objective should be: the standard average risk minimization of supervised learning is inadequate in handling several major constraints specific to federated learning, such as communication adaptivity and personalization control. We identify several key desiderata in frameworks for federated learning and introduce a new framework, FLIX, that takes into account the unique challenges brought by federated learning. FLIX has a standard finite-sum form, which enables practitioners to tap into the immense wealth of existing (potentially non-local) methods for distributed optimization. Through a smart initialization that does not require any communication, FLIX does not require the use of local steps but is still provably capable of performing dissimilarity regularization on par with local methods. We give several algorithms for solving the FLIX formulation efficiently under communication constraints. Finally, we corroborate our theoretical results with extensive experimentation. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/gasanov22a.html
  PDF: https://proceedings.mlr.press/v151/gasanov22a/gasanov22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-gasanov22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Elnur
    family: Gasanov
  - given: Ahmed
    family: Khaled
  - given: Samuel
    family: Horváth
  - given: Peter
    family: Richtarik
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11374-11421
  id: gasanov22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11374
  lastpage: 11421
  published: 2022-05-03 00:00:00 +0000
- title: ' Data Appraisal Without Data Sharing '
  abstract: ' One of the most effective approaches to improving the performance of a machine learning model is to procure additional training data. A model owner seeking relevant training data from a data owner needs to appraise the data before acquiring it. However, without a formal agreement, the data owner does not want to share data. The resulting Catch-22 prevents efficient data markets from forming. This paper proposes adding a data appraisal stage that requires no data sharing between data owners and model owners. Specifically, we use multi-party computation to implement an appraisal function computed on private data. The appraised value serves as a guide to facilitate data selection and transaction. We propose an efficient data appraisal method based on forward influence functions that approximates data value through its first-order loss reduction on the current model. The method requires no additional hyper-parameters or re-training. We show that in private, forward influence functions provide an appealing trade-off between high quality appraisal and required computation, in spite of label noise, class imbalance, and missing data. Our work seeks to inspire an open market that incentivizes efficient, equitable exchange of domain-specific training data. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/xu22e.html
  PDF: https://proceedings.mlr.press/v151/xu22e/xu22e.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-xu22e.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Xinlei
    family: Xu
  - given: Awni
    family: Hannun
  - given: Laurens
    family: Van Der Maaten
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11422-11437
  id: xu22e
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11422
  lastpage: 11437
  published: 2022-05-03 00:00:00 +0000
- title: ' On Global-view Based Defense via Adversarial Attack and Defense Risk Guaranteed Bounds '
  abstract: ' It is well-known that deep neural networks (DNNs) are susceptible to adversarial attacks, which presents the most severe fragility of the deep learning system. Despite achieving impressive performance, most of the current state-of-the-art classifiers remain highly vulnerable to carefully crafted imperceptible, adversarial perturbations. Recent research attempts to understand neural network attack and defense have become increasingly urgent and important. While rapid progress has been made on this front, there is still an important theoretical gap in achieving guaranteed bounds on attack/defense models, leaving uncertainty in the quality and certified guarantees of these models. To this end, we systematically address this problem in this paper. More specifically, we formulate attack and defense in a generic setting where there exists a family of adversaries (i.e., attackers) for attacking a family of classifiers (i.e., defenders). We develop a novel class of f-divergences suitable for measuring divergence among multiple distributions. This equips us to study the interactions between attackers and defenders in a countervailing game where we formulate a joint risk on attack and defense schemes. This is followed by our key results on guaranteed upper and lower bounds on this risk that can provide a better understanding of the behaviors of those parties from the attack and defense perspectives, thereby having important implications to both attack and defense sides. Finally, benefited from our theory, we propose an empirical approach that bases on a global view to defend against adversarial attacks. The experimental results conducted on benchmark datasets show that the global view for attack/defense if exploited appropriately can help to improve adversarial robustness. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/le22c.html
  PDF: https://proceedings.mlr.press/v151/le22c/le22c.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-le22c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Trung
    family: Le
  - given: Anh
    family: Tuan Bui
  - given: Le
    family: Minh Tri Tue
  - given: He
    family: Zhao
  - given: Paul
    family: Montague
  - given: Quan
    family: Tran
  - given: Dinh
    family: Phung
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11438-11460
  id: le22c
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11438
  lastpage: 11460
  published: 2022-05-03 00:00:00 +0000
- title: ' Transductive Robust Learning Guarantees '
  abstract: ' We study the problem of adversarially robust learning in the transductive setting. For classes H of bounded VC dimension, we propose a simple transductive learner that when presented with a set of labeled training examples and a set of unlabeled test examples (both sets possibly adversarially perturbed), it correctly labels the test examples with a robust error rate that is linear in the VC dimension and is adaptive to the complexity of the perturbation set. This result provides an exponential improvement in dependence on VC dimension over the best known upper bound on the robust error in the inductive setting, at the expense of competing with a more restrictive notion of optimal robust error. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/montasser22a.html
  PDF: https://proceedings.mlr.press/v151/montasser22a/montasser22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-montasser22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Omar
    family: Montasser
  - given: Steve
    family: Hanneke
  - given: Nathan
    family: Srebro
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11461-11471
  id: montasser22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11461
  lastpage: 11471
  published: 2022-05-03 00:00:00 +0000
- title: ' Online Competitive Influence Maximization '
  abstract: ' Online influence maximization has attracted much attention as a way to maximize influence spread through a social network while learning the values of unknown network parameters. Most previous works focus on single-item diffusion. In this paper, we introduce a new Online Competitive Influence Maximization (OCIM) problem, where two competing items (e.g., products, news stories) propagate in the same network and influence probabilities on edges are unknown. We adopt a combinatorial multi-armed bandit (CMAB) framework for OCIM, but unlike the non-competitive setting, the important monotonicity property (influence spread increases when influence probabilities on edges increase) no longer holds due to the competitive nature of propagation, which brings a significant new challenge to the problem. We provide a nontrivial proof showing that the Triggering Probability Modulated (TPM) condition for CMAB still holds in OCIM, which is instrumental for our proposed algorithms OCIM-TS and OCIM-OFU to achieve sublinear Bayesian and frequentist regret, respectively. We also design an OCIM-ETC algorithm that requires less feedback and easier offline computation, at the expense of a worse frequentist regret bound. Experimental evaluations demonstrate the effectiveness of our algorithms. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/zuo22a.html
  PDF: https://proceedings.mlr.press/v151/zuo22a/zuo22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-zuo22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jinhang
    family: Zuo
  - given: Xutong
    family: Liu
  - given: Carlee
    family: Joe-Wong
  - given: John C. S.
    family: Lui
  - given: Wei
    family: Chen
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11472-11502
  id: zuo22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11472
  lastpage: 11502
  published: 2022-05-03 00:00:00 +0000
- title: ' Adaptive Importance Sampling meets Mirror Descent : a Bias-variance Tradeoff '
  abstract: ' Adaptive importance sampling is a widely spread Monte Carlo technique that uses a re-weighting strategy to iteratively estimate the so-called target distribution. A major drawback of adaptive importance sampling is the large variance of the weights which is known to badly impact the accuracy of the estimates. This paper investigates a regularization strategy whose basic principle is to raise the importance weights at a certain power. This regularization parameter, that might evolve between zero and one during the algorithm, is shown (i) to balance between the bias and the variance and (ii) to be connected to the mirror descent framework. Using a kernel density estimate to build the sampling policy, the uniform convergence is established under mild conditions. Finally, several practical ways to choose the regularization parameter are discussed and the benefits of the proposed approach are illustrated empirically. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/korba22a.html
  PDF: https://proceedings.mlr.press/v151/korba22a/korba22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-korba22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Anna
    family: Korba
  - given: François
    family: Portier
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11503-11527
  id: korba22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11503
  lastpage: 11527
  published: 2022-05-03 00:00:00 +0000
- title: ' The role of optimization geometry in single neuron learning '
  abstract: ' Recent numerical experiments have demonstrated that the choice of optimization geometry used during training can impact generalization performance when learning expressive nonlinear model classes such as deep neural networks. These observations have important implications for modern deep learning, but remain poorly understood due to the difficulty of the associated nonconvex optimization. Towards an understanding of this phenomenon, we analyze a family of pseudogradient methods for learning generalized linear models under the square loss – a simplified problem containing both nonlinearity in the model parameters and nonconvexity of the optimization which admits a single neuron as a special case. We prove non-asymptotic bounds on the generalization error that sharply characterize how the interplay between the optimization geometry and the feature space geometry sets the out-of-sample performance of the learned model. Experimentally, selecting the optimization geometry as suggested by our theory leads to improved performance in generalized linear model estimation problems such as nonlinear and nonconvex variants of sparse vector recovery and low-rank matrix sensing. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/boffi22a.html
  PDF: https://proceedings.mlr.press/v151/boffi22a/boffi22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-boffi22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Nicholas
    family: Boffi
  - given: Stephen
    family: Tu
  - given: Jean-Jacques
    family: Slotine
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11528-11549
  id: boffi22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11528
  lastpage: 11549
  published: 2022-05-03 00:00:00 +0000
- title: ' Learning Tensor Representations for Meta-Learning '
  abstract: ' We introduce a tensor-based model of shared representation for meta-learning from a diverse set of tasks. Prior works on learning linear representations for meta-learning assume that there is a common shared representation across different tasks, and do not consider the additional task-specific observable side information. In this work, we model the meta-parameter through an order-$3$ tensor, which can adapt to the observed task features of the task. We propose two methods to estimate the underlying tensor. The first method solves a tensor regression problem and works under natural assumptions on the data generating process. The second method uses the method of moments under additional distributional assumptions and has an improved sample complexity in terms of the number of tasks. We also focus on the meta-test phase, and consider estimating task-specific parameters on a new task. Substituting the estimated tensor from the first step allows us estimating the task-specific parameters with very few samples of the new task, thereby showing the benefits of learning tensor representations for meta-learning. Finally, through simulation and several real-world datasets, we evaluate our methods and show that it improves over previous linear models of shared representations for meta-learning. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/deng22d.html
  PDF: https://proceedings.mlr.press/v151/deng22d/deng22d.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-deng22d.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Samuel
    family: Deng
  - given: Yilin
    family: Guo
  - given: Daniel
    family: Hsu
  - given: Debmalya
    family: Mandal
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11550-11580
  id: deng22d
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11550
  lastpage: 11580
  published: 2022-05-03 00:00:00 +0000
- title: ' Differentially Private Densest Subgraph '
  abstract: ' Given a graph, the densest subgraph problem asks for a set of vertices such that the average degree among these vertices is maximized. Densest subgraph has numerous applications in learning, e.g., community detection in social networks, link spam detection, correlation mining, bioinformatics, and so on. Although there are efficient algorithms that output either exact or approximate solutions to the densest subgraph problem, existing algorithms may violate the privacy of the individuals in the network, e.g., leaking the existence/non-existence of edges. In this paper, we study the densest subgraph problem in the framework of the differential privacy, and we derive the upper and lower bounds for this problem. We show that there exists a linear-time $\epsilon$-differentially private algorithm that finds a 2-approximation of the densest subgraph with an extra poly-logarithmic additive error. Our algorithm not only reports the approximate density of the densest subgraph, but also reports the vertices that form the dense subgraph. Our upper bound almost matches the famous 2-approximation by Charikar both in performance and in approximation ratio, but we additionally achieve differential privacy. In comparison with Charikar’s algorithm, our algorithm has an extra poly logarithmic additive error. We partly justify the additive error with a new lower bound, showing that for any differentially private algorithm that provides a constant-factor approximation, a sub-logarithmic additive error is inherent. We also practically study our differentially private algorithm on real-world graphs, and we show that in practice the algorithm finds a solution which is very close to the optimal. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/farhadi22a.html
  PDF: https://proceedings.mlr.press/v151/farhadi22a/farhadi22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-farhadi22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Alireza
    family: Farhadi
  - given: MohammadTaghi
    family: Hajiaghayi
  - given: Elaine
    family: Shi
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11581-11597
  id: farhadi22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11581
  lastpage: 11597
  published: 2022-05-03 00:00:00 +0000
- title: ' A Manifold View of Adversarial Risk '
  abstract: ' The adversarial risk of a machine learning model has been widely studied. Most previous works assume that the data lies in the whole ambient space. We propose to take a new angle and take the manifold assumption into consideration. Assuming data lies in a manifold, we investigate two new types of adversarial risk, the normal adversarial risk due to perturbation along normal direction, and the in-manifold adversarial risk due to perturbation within the manifold. We prove that the classic adversarial risk can be bounded from both sides using the normal and in-manifold adversarial risks. We also show with a surprisingly pessimistic case that the standard adversarial risk can be nonzero even when both normal and in-manifold risks are zero. We finalize the paper with empirical studies supporting our theoretical results. Our results suggest the possibility of improving the robustness of a classifier by only focusing on the normal adversarial risk. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/zhang22h.html
  PDF: https://proceedings.mlr.press/v151/zhang22h/zhang22h.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-zhang22h.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Wenjia
    family: Zhang
  - given: Yikai
    family: Zhang
  - given: Xiaoling
    family: Hu
  - given: Mayank
    family: Goswami
  - given: Chao
    family: Chen
  - given: Dimitris N.
    family: Metaxas
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11598-11614
  id: zhang22h
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11598
  lastpage: 11614
  published: 2022-05-03 00:00:00 +0000
- title: ' Statistical and computational thresholds for the planted k-densest sub-hypergraph problem '
  abstract: ' In this work, we consider the problem of recovery a planted k-densest sub-hypergraph on d-uniform hypergraphs. This fundamental problem appears in different contexts, e.g., community detection, average-case complexity, and neuroscience applications as a structural variant of tensor-PCA problem. We provide tight information-theoretic upper and lower bounds for the exact recovery threshold by the maximum-likelihood estimator, as well as algorithmic bounds based on approximate message passing algorithms. The problem exhibits a typical statistical-to-computational gap observed in analogous sparse settings that widen with increasing sparsity of the problem. The bounds show that the signal structure impacts the location of the statistical and computational phase transition that the known existing bounds for the tensor-PCA model do not capture. This effect is due to the generic planted signal prior that this latter model addresses. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/corinzia22a.html
  PDF: https://proceedings.mlr.press/v151/corinzia22a/corinzia22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-corinzia22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Luca
    family: Corinzia
  - given: Paolo
    family: Penna
  - given: Wojciech
    family: Szpankowski
  - given: Joachim
    family: Buhmann
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11615-11640
  id: corinzia22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11615
  lastpage: 11640
  published: 2022-05-03 00:00:00 +0000
- title: ' Controlling Epidemic Spread using Probabilistic Diffusion Models on Networks '
  abstract: ' The spread of an epidemic is often modeled by an SIR random process on a social network graph. The MinInfEdge problem for optimal social distancing involves minimizing the expected number of infections, when we are allowed to break at most B edges; similarly the MinInfNode problem involves removing at most B vertices. These are fundamental problems in epidemiology and network science. While a number of heuristics have been considered, the complexity of this problem remains generally open. In this paper, we present two bicriteria approximation algorithms for the MinInfEdge problem, which give the first non-trivial approximations for this problem. The first is based on the cut sparsification result technique of Karger, which works for any graph, when the transmission probabilities are not too small. The second is a Sample Average Approximation (SAA) based algorithm, which we analyze for the Chung-Lu random graph model. We also extend some of our results for the MinInfNode problem. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/babay22a.html
  PDF: https://proceedings.mlr.press/v151/babay22a/babay22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-babay22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Amy E.
    family: Babay
  - given: Michael
    family: Dinitz
  - given: Aravind
    family: Srinivasan
  - given: Leonidas
    family: Tsepenekas
  - given: Anil
    family: Vullikanti
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11641-11654
  id: babay22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11641
  lastpage: 11654
  published: 2022-05-03 00:00:00 +0000
- title: ' Equivariant Deep Dynamical Model for Motion Prediction '
  abstract: ' Learning representations through deep generative modeling is a powerful approach for dynamical modeling to discover the most simplified and compressed underlying description of the data, to then use it for other tasks such as prediction. Most learning tasks have intrinsic symmetries, i.e., the input transformations leave the output unchanged, or the output undergoes a similar transformation. The learning process is, however, usually uninformed of these symmetries. Therefore, the learned representations for individually transformed inputs may not be meaningfully related. In this paper, we propose an SO(3) equivariant deep dynamical model (EqDDM) for motion prediction that learns a structured representation of the input space in the sense that the embedding varies with symmetry transformations. EqDDM is equipped with equivariant networks to parameterize the state-space emission and transition models. We demonstrate the superior predictive performance of the proposed model on various motion data. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/azari22a.html
  PDF: https://proceedings.mlr.press/v151/azari22a/azari22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-azari22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Bahar
    family: Azari
  - given: Deniz
    family: Erdogmus
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11655-11668
  id: azari22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11655
  lastpage: 11668
  published: 2022-05-03 00:00:00 +0000
- title: ' The Fast Kernel Transform '
  abstract: ' Kernel methods are a highly effective and widely used collection of modern machine learning algorithms. A fundamental limitation of virtually all such methods are computations involving the kernel matrix that naively scale quadratically (e.g., matrix-vector multiplication) or cubically (solving linear systems) with the size of the dataset N. We propose the Fast Kernel Transform (FKT), a general algorithm to compute matrix-vector multiplications (MVMs) for datasets in moderate dimensions with quasilinear complexity. Typically, analytically grounded fast multiplication methods require specialized development for specific kernels. In contrast, our scheme is based on auto-differentiation and automated symbolic computations that leverage the analytical structure of the underlying kernel. This allows the FKT to be easily applied to a broad class of kernels, including Gaussian, Matern, and Rational Quadratic covariance functions and Green’s functions, including those of the Laplace and Helmholtz equations. Furthermore, the FKT maintains a high, quantifiable, and controllable level of accuracy—properties that many acceleration methods lack. We illustrate the efficacy and versatility of the FKT by providing timing and accuracy benchmarks with comparisons to adjacent methods, and by applying it to scale the stochastic neighborhood embedding (t-SNE) and Gaussian processes to large real-world datasets. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ryan22a.html
  PDF: https://proceedings.mlr.press/v151/ryan22a/ryan22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ryan22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: John P.
    family: Ryan
  - given: Sebastian E.
    family: Ament
  - given: Carla P.
    family: Gomes
  - given: Anil
    family: Damle
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11669-11690
  id: ryan22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11669
  lastpage: 11690
  published: 2022-05-03 00:00:00 +0000
- title: ' Outlier-Robust Optimal Transport: Duality, Structure, and Statistical Analysis '
  abstract: ' The Wasserstein distance, rooted in optimal transport (OT) theory, is a popular discrepancy measure between probability distributions with various applications to statistics and machine learning. Despite their rich structure and demonstrated utility, Wasserstein distances are sensitive to outliers in the considered distributions, which hinders applicability in practice. We propose a new outlier-robust Wasserstein distance $\mathsf{W}_p^\varepsilon$ which allows for $\varepsilon$ outlier mass to be removed from each contaminated distribution. Under standard moment assumptions, $\mathsf{W}_p^\varepsilon$ is shown to be minimax optimal for robust estimation under the Huber $\varepsilon$-contamination model. Our formulation of this robust distance amounts to a highly regular optimization problem that lends itself better for analysis compared to previously considered frameworks. Leveraging this, we conduct a thorough theoretical study of $\mathsf{W}_p^\varepsilon$, encompassing robustness guarantees, characterization of optimal perturbations, regularity, duality, and statistical estimation. In particular, by decoupling the optimization variables, we arrive at a simple dual form for $\mathsf{W}_p^\varepsilon$ that can be implemented via an elementary modification to standard, duality-based OT solvers. We illustrate the virtues of our framework via applications to generative modeling with contaminated datasets. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/nietert22a.html
  PDF: https://proceedings.mlr.press/v151/nietert22a/nietert22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-nietert22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Sloan
    family: Nietert
  - given: Ziv
    family: Goldfeld
  - given: Rachel
    family: Cummings
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11691-11719
  id: nietert22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11691
  lastpage: 11719
  published: 2022-05-03 00:00:00 +0000
- title: ' Diversity and Generalization in Neural Network Ensembles '
  abstract: ' Ensembles are widely used in machine learning and, usually, provide state-of-the-art performance in many prediction tasks. From the very beginning, the diversity of an ensemble has been identified as a key factor for the superior performance of these models. But the exact role that diversity plays in ensemble models is poorly understood, specially in the context of neural networks. In this work, we combine and expand previously published results in a theoretically sound framework that describes the relationship between diversity and ensemble performance for a wide range of ensemble methods. More precisely, we provide sound answers to the following questions: how to measure diversity, how diversity relates to the generalization error of an ensemble, and how diversity is promoted by neural network ensemble algorithms. This analysis covers three widely used loss functions, namely, the squared loss, the cross-entropy loss, and the 0-1 loss; and two widely used model combination strategies, namely, model averaging and weighted majority vote. We empirically validate this theoretical analysis with neural network ensembles. '
  volume: 151
  URL: https://proceedings.mlr.press/v151/ortega22a.html
  PDF: https://proceedings.mlr.press/v151/ortega22a/ortega22a.pdf
  edit: https://github.com/mlresearch//v151/edit/gh-pages/_posts/2022-05-03-ortega22a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 25th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Luis A.
    family: Ortega
  - given: Rafael
    family: Cabañas
  - given: Andres
    family: Masegosa
  editor: 
  - given: Gustau
    family: Camps-Valls
  - given: Francisco J. R.
    family: Ruiz
  - given: Isabel
    family: Valera
  page: 11720-11743
  id: ortega22a
  issued:
    date-parts: 
      - 2022
      - 5
      - 3
  firstpage: 11720
  lastpage: 11743
  published: 2022-05-03 00:00:00 +0000