- title: 'The 35th Uncertainty in Artificial Intelligence Conference: Preface' abstract: 'The Conference on Uncertainty in Artificial Intelligence (UAI) is the premier international conference on research related to representation, inference, learning and decision making in the presence of uncertainty within the field of Artificial Intelligence. This volume contains all papers that were accepted for the 35th UAI Conference, held in Tel Aviv, Israel from July 22 to 25, 2019.' volume: 115 URL: https://proceedings.mlr.press/v115/adams20a.html PDF: http://proceedings.mlr.press/v115/adams20a/adams20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-adams20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Ryan family: Adams - given: Vibhav family: Gogate editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 1-17 id: adams20a issued: date-parts: - 2020 - 8 - 6 firstpage: 1 lastpage: 17 published: 2020-08-06 00:00:00 +0000 - title: 'Personalized Peer Truth Serum for Eliciting Multi-Attribute Personal Data' abstract: 'Several peer consistency mechanisms have been proposed to incentivize agents for honestly solving crowdsourcing tasks. These game-theoretic mechanisms evaluate the answers provided by an agent based on the correlation with answers provided by other agents ("peers") who solve the same tasks. In this paper, we consider the problem of eliciting personal attributes (for e.g. body measurements) of the agents. Since attributes are personal in nature, the tasks can not be shared between two agents. We show for the first time how to extend a peer consistency incentive mechanism, the Logarithmic Peer Truth Serum, to this setting for collecting personal attributes. When individuals report combinations of multiple personal data attributes, the correlation between them can be exploited to find peers. This new mechanism applies, for example, to collecting personal health records and other multi-attribute measurements at private properties such as smart homes. We provide a theoretical analysis of the incentive properties of the new mechanism and show the performance of the mechanism on several public datasets, which confirm the theoretical analysis.' volume: 115 URL: https://proceedings.mlr.press/v115/goel20a.html PDF: http://proceedings.mlr.press/v115/goel20a/goel20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-goel20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Naman family: Goel - given: Boi family: Faltings editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 18-27 id: goel20a issued: date-parts: - 2020 - 8 - 6 firstpage: 18 lastpage: 27 published: 2020-08-06 00:00:00 +0000 - title: 'Conditional Expectation Propagation' abstract: 'Expectation propagation (EP) is a powerful approximate inference algorithm. However, a critical barrier in applying EP is that the moment matching in message updates can be intractable. Handcrafting approximations is usually tricky, and lacks generalizability. Importance sampling is very expensive. While Laplace propagation offers an excellent solution, it has to run numerical optimizations to find Laplace approximations in every update, which is still quite inefficient. To overcome these practical barriers, we propose conditional expectation propagation (CEP) that performs conditional moment matching given the variables outside each message fixed, and then takes expectation w.r.t their approximate posterior. The conditional moments are often analytical and much easier to derive. In the most general case, we can use (fully) factorized messages so that the conditional moments can be represented by quadrature formulas. We then compute the expectation of the conditional moments via Taylor approximations when necessary. In this way, our algorithm can always conduct efficient, analytical fixed point iterations. Experiments on several popular models for which standard EP is available or unavailable demonstrate the advantages of CEP in both inference quality and computational efficiency.' volume: 115 URL: https://proceedings.mlr.press/v115/wang20a.html PDF: http://proceedings.mlr.press/v115/wang20a/wang20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-wang20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Zheng family: Wang - given: Shandian family: Zhe editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 28-37 id: wang20a issued: date-parts: - 2020 - 8 - 6 firstpage: 28 lastpage: 37 published: 2020-08-06 00:00:00 +0000 - title: 'A Sparse Representation-Based Approach to Linear Regression with Partially Shuffled Labels' abstract: 'Several recent papers have discussed a modification of linear regression in which the correspondence between input variables and labels is missing or erroneous, referred to as "Linear Regression with Unknown Permutation", or "Linear Regression with Shuffled Data". Prior studies of this setup have shed light on the associated statistical limits. However, practical and well-founded approaches that overcome the computational challenges of the setup are still scarce. In this paper, we translate the problem of linear regression with unknown permutation to a robust subspace recovery problem, or alternatively, outlier recovery problem. Outliers correspond to mismatched data, and a specific sparse representation of the data is used to detect the outliers. The proposed approach requires at least a reasonably large fraction (but potentially significantly less than 50%) of the response-predictor pairs to be in correct correspondence. This assumption is often justified, e.g., if record linkage based on additional contextual information is sufficiently accurate to enable correct matching of such fraction of the data. It turns out that in this situation, estimation of the regression parameter on the one hand and recovery of the underlying permutation on the other hand can be decoupled so that the computational hardness associated with the latter can be sidestepped. ' volume: 115 URL: https://proceedings.mlr.press/v115/slawski20a.html PDF: http://proceedings.mlr.press/v115/slawski20a/slawski20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-slawski20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Martin family: Slawski - given: Mostafa family: Rahmani - given: Ping family: Li editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 38-48 id: slawski20a issued: date-parts: - 2020 - 8 - 6 firstpage: 38 lastpage: 48 published: 2020-08-06 00:00:00 +0000 - title: 'On Fast Convergence of Proximal Algorithms for SQRT-Lasso Optimization: Don’t Worry About its Nonsmooth Loss Function' abstract: 'Many machine learning techniques sacrifice convenient computational structures to gain estimation robustness and modeling flexibility. However, by exploring the modeling structures, we find these “sacrifices” do not always require more computational efforts. To shed light on such a “free-lunch” phenomenon, we study the square-root-Lasso (SQRT-Lasso) type regression problem. Specifically, we show that the nonsmooth loss functions of SQRT-Lasso type regression ease tuning effort and gain adaptivity to inhomogeneous noise, but is not necessarily more challenging than Lasso in computation. We can directly apply proximal algorithms (e.g. proximal gradient descent, proximal Newton, and proximal quasi-Newton algorithms) without worrying about the nonsmoothness of the loss function. Theoretically, we prove that the proximal algorithms combined with the pathwise optimization scheme enjoy fast convergence guarantees with high probability. Numerical results are provided to support our theory.' volume: 115 URL: https://proceedings.mlr.press/v115/li20a.html PDF: http://proceedings.mlr.press/v115/li20a/li20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-li20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Xinguo family: Li - given: Haoming family: Jiang - given: Jarvis family: Haupt - given: Raman family: Arora - given: Han family: Liu - given: Mingyi family: Hong - given: Tuo family: Zhao editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 49-59 id: li20a issued: date-parts: - 2020 - 8 - 6 firstpage: 49 lastpage: 59 published: 2020-08-06 00:00:00 +0000 - title: 'Correlated Learning for Aggregation Systems' abstract: 'Aggregation systems (e.g., Uber, Lyft, Food- Panda, Deliveroo) have been increasingly used to improve efficiency in numerous environments, including in transportation, logistics, food and grocery delivery. In these systems, a centralized entity (e.g., Uber) aggregates sup- ply and assigns them to demand so as to optimize a central metric such as profit, number of requests, delay etc. Due to optimizing a metric of importance to the centralized entity, the interests of individuals (e.g., drivers, de- livery boys) can be sacrificed. Therefore, in this paper, we focus on the problem of serving individual interests, i.e., learning revenue maximizing policies for individuals in the presence of a self interested central entity. Since there are large number of learning agents that are homogenous, we represent the problem as an Anonymous Multi-Agent Reinforcement Learn- ing (AyMARL) problem. By using the self interested centralized entity as a correlation entity, we provide a novel learning mechanism that helps individual agents to maximize their individual revenue. Our Correlated Learning (CL) algorithm is able to outperform existing mechanisms on a generic simulator for aggregation systems and multiple other benchmark Multi-Agent Reinforcement Learning (MARL) problems.' volume: 115 URL: https://proceedings.mlr.press/v115/verma20a.html PDF: http://proceedings.mlr.press/v115/verma20a/verma20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-verma20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Tanvi family: Verma - given: Pradeep family: Varakantham editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 60-70 id: verma20a issued: date-parts: - 2020 - 8 - 6 firstpage: 60 lastpage: 70 published: 2020-08-06 00:00:00 +0000 - title: 'Causal Calculus in the Presence of Cycles, Latent Confounders and Selection Bias' abstract: 'We prove the main rules of causal calculus (also called do-calculus) for i/o structural causal models (ioSCMs), a generalization of a recently proposed general class of non-linear structural causal models that allow for cycles, latent confounders and arbitrary probability distributions. We also generalize adjustment criteria and formulas from the acyclic setting to the general one (i.e. ioSCMs). Such criteria then allow to estimate (conditional) causal effects from observational data that was (partially) gathered under selection bias and cycles. This generalizes the backdoor criterion, the selection-backdoor criterion and extensions of these to arbitrary ioSCMs. Together, our results thus enable causal reasoning in the presence of cycles, latent confounders and selection bias. Finally, we extend the ID algorithm for the identification of causal effects to ioSCMs.' volume: 115 URL: https://proceedings.mlr.press/v115/forre20a.html PDF: http://proceedings.mlr.press/v115/forre20a/forre20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-forre20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Patrick family: Forré - given: Joris M. family: Mooij editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 71-80 id: forre20a issued: date-parts: - 2020 - 8 - 6 firstpage: 71 lastpage: 80 published: 2020-08-06 00:00:00 +0000 - title: 'Variational Regret Bounds for Reinforcement Learning' abstract: 'We consider undiscounted reinforcement learning in Markov decision processes (MDPs) where \textit{both} the reward functions and the state-transition probabilities may vary (gradually or abruptly) over time. For this problem setting, we propose an algorithm and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. The upper bound on the regret is given in terms of the total variation in the MDP. This is the first variational regret bound for the general reinforcement learning setting. ' volume: 115 URL: https://proceedings.mlr.press/v115/ortner20a.html PDF: http://proceedings.mlr.press/v115/ortner20a/ortner20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-ortner20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Ronald family: Ortner - given: Pratik family: Gajane - given: Peter family: Auer editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 81-90 id: ortner20a issued: date-parts: - 2020 - 8 - 6 firstpage: 81 lastpage: 90 published: 2020-08-06 00:00:00 +0000 - title: 'Recommendation from Raw Data with Adaptive Compound Poisson Factorization' abstract: 'Count data are often used in recommender systems: they are widespread (song play counts, product purchases, clicks on web pages) and can reveal user preference without any explicit rating from the user. Such data are known to be sparse, over-dispersed and bursty, which makes their direct use in recommender systems challenging, often leading to pre-processing steps such as binarization. The aim of this paper is to build recommender systems from these raw data, by means of the recently proposed compound Poisson Factorization (cPF). The paper contributions are three-fold: we present a unified framework for discrete data (dcPF), leading to an adaptive and scalable algorithm; we show that our framework achieves a trade-off between Poisson Factorization (PF) applied to raw and binarized data; we study four specific instances that are relevant to recommendation and exhibit new links with combinatorics. Experiments with three different datasets show that dcPF is able to effectively adjust to over-dispersion, leading to better recommendation scores when compared with PF on either raw or binarized data.' volume: 115 URL: https://proceedings.mlr.press/v115/gouvert20a.html PDF: http://proceedings.mlr.press/v115/gouvert20a/gouvert20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-gouvert20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Olivier family: Gouvert - given: Thomas family: Oberlin - given: Cédric family: Févotte editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 91-101 id: gouvert20a issued: date-parts: - 2020 - 8 - 6 firstpage: 91 lastpage: 101 published: 2020-08-06 00:00:00 +0000 - title: 'One-Shot Marginal MAP Inference in Markov Random Fields' abstract: 'Statistical inference in Markov random fields (MRFs) is NP-hard in all but the simplest of cases. As a result, many algorithms, particularly in the case of discrete random variables, have been developed to perform approximate inference in practice. However, most of these methods scale poorly, cannot be applied to continuous random variables, or are too slow to be used in situations that call for repeated statistical inference on the same model. In this work, we propose a novel variational inference strategy that is flexible enough to handle both continuous and discrete random variables, efficient enough to be able to handle repeated statistical inferences, and scalable enough, via modern GPUs, to be practical on MRFs with hundreds of thousands of random variables. We prove that our approach overcomes weaknesses of the current approaches and demonstrate the efficacy of our approach on both synthetic models and real-world applications.' volume: 115 URL: https://proceedings.mlr.press/v115/xiong20a.html PDF: http://proceedings.mlr.press/v115/xiong20a/xiong20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-xiong20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Hao family: Xiong - given: Yuanzhen family: Guo - given: Yibo family: Yang - given: Nicholas family: Ruozzi editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 102-112 id: xiong20a issued: date-parts: - 2020 - 8 - 6 firstpage: 102 lastpage: 112 published: 2020-08-06 00:00:00 +0000 - title: 'Truly Proximal Policy Optimization' abstract: 'Proximal policy optimization (PPO) is one of the most successful deep reinforcement learning methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, its optimization behavior is still far from being fully understood. In this paper, we show that PPO could neither strictly restrict the probability ratio as it attempts to do nor enforce a well-defined trust region constraint, which means that it may still suffer from the risk of performance instability. To address this issue, we present an enhanced PPO method, named Trust Region-based PPO with Rollback (TR-PPO-RB). Two critical improvements are made in our method: 1) it adopts a new clipping function to support a rollback behavior to restrict the ratio between the new policy and the old one; 2) the triggering condition for clipping is replaced with a trust region-based one, which is theoretically justified according to the trust region theorem. It seems, by adhering more truly to the “proximal” property − restricting the policy within the trust region, the new algorithm improves the original PPO on both stability and sample efficiency.' volume: 115 URL: https://proceedings.mlr.press/v115/wang20b.html PDF: http://proceedings.mlr.press/v115/wang20b/wang20b.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-wang20b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Yuhui family: Wang - given: Hao family: He - given: Xiaoyang family: Tan editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 113-122 id: wang20b issued: date-parts: - 2020 - 8 - 6 firstpage: 113 lastpage: 122 published: 2020-08-06 00:00:00 +0000 - title: 'Learning Factored Markov Decision Processes with Unawareness' abstract: 'Methods for learning and planning in sequential decision problems often assume the learner is aware of all possible states and actions in advance. This assumption is sometimes untenable. In this paper, we give a method to learn factored markov decision problems from both domain exploration and expert assistance, which guarantees convergence to near-optimal behaviour, even when the agent begins unaware of factors critical to success. Our experiments show our agent learns optimal behaviour on both small and large problems, and that conserving information on discovering new possibilities results in faster convergence.' volume: 115 URL: https://proceedings.mlr.press/v115/innes20a.html PDF: http://proceedings.mlr.press/v115/innes20a/innes20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-innes20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Craig family: Innes - given: Alex family: Lascarides editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 123-133 id: innes20a issued: date-parts: - 2020 - 8 - 6 firstpage: 123 lastpage: 133 published: 2020-08-06 00:00:00 +0000 - title: 'Expressive Priors in Bayesian Neural Networks: Kernel Combinations and Periodic Functions' abstract: 'A simple, flexible approach to creating expressive priors in Gaussian process (GP) models makes new kernels from a combination of basic kernels, e.g. summing a periodic and linear kernel can capture seasonal variation with a long term trend. Despite a well-studied link between GPs and Bayesian neural networks (BNNs), the BNN analogue of this has not yet been explored. This paper derives BNN architectures mirroring such kernel combinations. Furthermore, it shows how BNNs can produce periodic kernels, which are often useful in this context. These ideas provide a principled approach to designing BNNs that incorporate prior knowledge about a function. We showcase the practical value of these ideas with illustrative experiments in supervised and reinforcement learning settings.' volume: 115 URL: https://proceedings.mlr.press/v115/pearce20a.html PDF: http://proceedings.mlr.press/v115/pearce20a/pearce20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-pearce20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Tim family: Pearce - given: Russell family: Tsuchida - given: Mohamed family: Zaki - given: Alexandra family: Brintrup - given: Andy family: Neely editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 134-144 id: pearce20a issued: date-parts: - 2020 - 8 - 6 firstpage: 134 lastpage: 144 published: 2020-08-06 00:00:00 +0000 - title: 'Countdown Regression: Sharp and Calibrated Survival Predictions' abstract: 'Probabilistic survival predictions (i.e. personalized survival curves) from models trained with Maximum Likelihood Estimation (MLE) can have high, and sometimes unacceptably high variance. The field of meteorology, where the paradigm of maximizing sharpness subject to calibration is popular, has addressed this problem by using scoring rules beyond MLE, such as the Continuous Ranked Probability Score (CRPS). In this paper we present the \emph{Survival-CRPS}, a generalization of the CRPS to the survival prediction setting, with right-censored and interval-censored variants. We evaluate our ideas on the mortality prediction task using two different Electronic Health Record (EHR) data sets (STARR and MIMIC-III) covering millions of patients, with suitable deep neural network architectures: a Recurrent Neural Network (RNN) for STARR and a Fully Connected Network (FCN) for MIMIC-III. We compare results between the two different scoring rules while keeping the network architecture and data fixed, and show that models trained with Survival-CRPS result in sharper predictive distributions compared to those trained by MLE, while still maintaining calibration.' volume: 115 URL: https://proceedings.mlr.press/v115/avati20a.html PDF: http://proceedings.mlr.press/v115/avati20a/avati20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-avati20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Anand family: Avati - given: Tony family: Duan - given: Sharon family: Zhou - given: Kenneth family: Jung - given: Nigam H. family: Shah - given: Andrew Y. family: Ng editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 145-155 id: avati20a issued: date-parts: - 2020 - 8 - 6 firstpage: 145 lastpage: 155 published: 2020-08-06 00:00:00 +0000 - title: 'Reducing Exploration of Dying Arms in Mortal Bandits' abstract: 'Mortal bandits have proven to be extremely useful for providing news article recommendations, running automated online advertising campaigns, and for other applications where the set of available options changes over time. Previous work on this problem showed how to regulate exploration of new arms when they have recently appeared, but they do not adapt when the arms are about to disappear. Since in most applications we can determine either exactly or approximately when arms will disappear, we can leverage this information to improve performance: we should not be exploring arms that are about to disappear. We provide adaptations of algorithms, regret bounds, and experiments for this study, showing a clear benefit from regulating greed (exploration/exploitation) for arms that will soon disappear. We illustrate numerical performance on the Yahoo! Front Page Today Module User Click Log Dataset.' volume: 115 URL: https://proceedings.mlr.press/v115/traca20a.html PDF: http://proceedings.mlr.press/v115/traca20a/traca20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-traca20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Stefano family: Tracà - given: Cynthia family: Rudin - given: Weiyu family: Yan editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 156-163 id: traca20a issued: date-parts: - 2020 - 8 - 6 firstpage: 156 lastpage: 163 published: 2020-08-06 00:00:00 +0000 - title: 'Comparing EM with GD in Mixture Models of Two Components' abstract: 'The expectation-maximization (EM) algorithm has been widely used in minimizing the negative log likelihood (also known as cross entropy) of mixture models. However, little is understood about the goodness of the fixed points it converges to. In this paper, we study the regions where one component is missing in two-component mixture models, which we call one-cluster regions. We analyze the propensity of such regions to trap EM and gradient descent (GD) for mixtures of two Gaussians and mixtures of two Bernoullis. In the case of Gaussian mixtures, EM escapes one-cluster regions exponentially fast, while GD escapes them linearly fast. In the case of mixtures of Bernoullis, we find that there exist one-cluster regions that are stable for GD and therefore trap GD, but those regions are unstable for EM, allowing {EM} to escape. Those regions are local minima that appear universally in experiments and can be arbitrarily bad. This work implies that EM is less likely than GD to converge to certain bad local optima in mixture models. ' volume: 115 URL: https://proceedings.mlr.press/v115/zhang20a.html PDF: http://proceedings.mlr.press/v115/zhang20a/zhang20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-zhang20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Guojun family: Zhang - given: Pascal family: Poupart - given: George family: Trimponias editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 164-174 id: zhang20a issued: date-parts: - 2020 - 8 - 6 firstpage: 164 lastpage: 174 published: 2020-08-06 00:00:00 +0000 - title: 'Efficient Search-Based Weighted Model Integration' abstract: 'Weighted model integration (WMI) extends Weighted model counting (WMC) to the integration of functions over mixed discrete-continuous domains. It has shown tremendous promise for solving inference problems in graphical models and probabilistic programming. Yet, state-of-the-art tools for WMI are limited in terms of performance and ignore the independence structure that is crucial to improving efficiency. To address this limitation, we propose an efficient model integration algorithm for theories with tree primal graphs. We exploit the sparse graph structure by using search to performing integration. Our algorithm greatly improves the computational efficiency on such problems and exploits context-specific independence between variables. Experimental results show dramatic speedups compared to existing WMI solvers on problems with tree-shaped dependencies.' volume: 115 URL: https://proceedings.mlr.press/v115/zeng20a.html PDF: http://proceedings.mlr.press/v115/zeng20a/zeng20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-zeng20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Zhe family: Zeng - given: Guy prefix: Van den family: Broeck editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 175-185 id: zeng20a issued: date-parts: - 2020 - 8 - 6 firstpage: 175 lastpage: 185 published: 2020-08-06 00:00:00 +0000 - title: 'Causal Discovery with General Non-Linear Relationships using Non-Linear ICA' abstract: 'We consider the problem of inferring causal relationships between two or more passively observed variables. While the problem of such causal discovery has been extensively studied especially in the bivariate setting, the majority of current methods assume a linear causal relationship, and the few methods which consider non-linear dependencies usually make the assumption of additive noise. Here, we propose a framework through which we can perform causal discovery in the presence of general non-linear relationships. The proposed method is based on recent progress in non-linear independent component analysis and exploits the non-stationarity of observations in order to recover the underlying sources or latent disturbances. We show rigorously that in the case of bivariate causal discovery, such non-linear ICA can be used to infer the causal direction via a series of independence tests. We further propose an alternative measure of causal direction based on asymptotic approximations to the likelihood ratio, as well as an extension to multivariate causal discovery. We demonstrate the capabilities of the proposed method via a series of simulation studies and conclude with an application to neuroimaging data. ' volume: 115 URL: https://proceedings.mlr.press/v115/monti20a.html PDF: http://proceedings.mlr.press/v115/monti20a/monti20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-monti20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Ricardo Pio family: Monti - given: Kun family: Zhang - given: Aapo family: Hyvärinen editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 186-195 id: monti20a issued: date-parts: - 2020 - 8 - 6 firstpage: 186 lastpage: 195 published: 2020-08-06 00:00:00 +0000 - title: 'BubbleRank: Safe Online Learning to Re-Rank via Implicit Click Feedback' abstract: 'In this paper, we study the problem of safe online learning to re-rank, where user feedback is used to improve the quality of displayed lists. Learning to rank has traditionally been studied in two settings. In the offline setting, rankers are typically learned from relevance labels created by judges. This approach has generally become standard in industrial applications of ranking, such as search. However, this approach lacks exploration and thus is limited by the information content of the offline training data. In the online setting, an algorithm can experiment with lists and learn from feedback on them in a sequential fashion. Bandit algorithms are well-suited for this setting but they tend to learn user preferences from scratch, which results in a high initial cost of exploration. This poses an additional challenge of safe exploration in ranked lists. We propose BubbleRank, a bandit algorithm for safe re-ranking that combines the strengths of both the offline and online settings. The algorithm starts with an initial base list and improves it online by gradually exchanging higher-ranked less attractive items for lower-ranked more attractive items. We prove an upper bound on the n-step regret of BubbleRank that degrades gracefully with the quality of the initial base list. Our theoretical findings are supported by extensive experiments on a large-scale real-world click dataset.' volume: 115 URL: https://proceedings.mlr.press/v115/li20b.html PDF: http://proceedings.mlr.press/v115/li20b/li20b.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-li20b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Chang family: Li - given: Branislav family: Kveton - given: Tor family: Lattimore - given: Ilya family: Markov - given: Maarten family: de Rijke - given: Csaba family: Szepesvári - given: Masrour family: Zoghi editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 196-206 id: li20b issued: date-parts: - 2020 - 8 - 6 firstpage: 196 lastpage: 206 published: 2020-08-06 00:00:00 +0000 - title: 'Coordinating Users of Shared Facilities via Data-driven Predictive Assistants and Game Theory' abstract: 'We study data-driven assistants that provide congestion forecasts to users of shared facilities (roads, cafeterias, etc.), to support coordination between them, and increase efficiency of such collective systems. Key questions are: (1) when and how much can (accurate) predictions help for coordination, and (2) which assistant algorithms reach optimal predictions? First we lay conceptual ground for this setting where user preferences are a priori unknown and predictions influence outcomes. Addressing (1), we establish conditions under which self-fulfilling prophecies, i.e., “perfect” (probabilistic) predictions of what will happen, solve the coordination problem in the gametheoretic sense of selecting a Bayesian Nash equilibrium (BNE). Next we prove that such prophecies exist even in large-scale settings where only aggregated statistics about users are available. This entails a new (nonatomic) BNE existence result. Addressing (2), we propose two assistant algorithms that sequentially learn from users’ reactions, together with optimality/ convergence guarantees. We validate one of them in a large real-world experiment.' volume: 115 URL: https://proceedings.mlr.press/v115/geiger20a.html PDF: http://proceedings.mlr.press/v115/geiger20a/geiger20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-geiger20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Philipp family: Geiger - given: Michel family: Besserve - given: Justus family: Winkelmann - given: Claudius family: Proissl - given: Bernhard family: Schölkopf editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 207-216 id: geiger20a issued: date-parts: - 2020 - 8 - 6 firstpage: 207 lastpage: 216 published: 2020-08-06 00:00:00 +0000 - title: 'The Incomplete Rosetta Stone problem: Identifiability results for Multi-view Nonlinear ICA' abstract: 'We consider the problem of recovering a common latent source with independent components from multiple views. This applies to settings in which a variable is measured with multiple experimental modalities, and where the goal is to synthesize the disparate measurements into a single unified representation. We consider the case that the observed views are a nonlinear mixing of component-wise corruptions of the sources. When the views are considered separately, this reduces to nonlinear Independent Component Analysis (ICA) for which it is provably impossible to undo the mixing. We present novel identifiability proofs that this is possible when the multiple views are considered jointly, showing that the mixing can theoretically be undone using function approximators such as deep neural networks. In contrast to known identifiability results for nonlinear ICA, we prove that independent latent sources with arbitrary mixing can be recovered as long as multiple, sufficiently different noisy views are available. ' volume: 115 URL: https://proceedings.mlr.press/v115/gresele20a.html PDF: http://proceedings.mlr.press/v115/gresele20a/gresele20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-gresele20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Luigi family: Gresele - given: Paul K. family: Rubenstein - given: Arash family: Mehrjou - given: Francesco family: Locatello - given: Bernhard family: Schölkopf editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 217-227 id: gresele20a issued: date-parts: - 2020 - 8 - 6 firstpage: 217 lastpage: 227 published: 2020-08-06 00:00:00 +0000 - title: 'Random Clique Covers for Graphs with Local Density and Global Sparsity' abstract: 'Large real-world graphs tend to be sparse, but they often contain many densely connected subgraphs and exhibit high clustering coefficients. While recent random graph models can capture this sparsity, they ignore the local density, or vice versa. We develop a Bayesian nonparametric graph model based on random edge clique covers, and show that this model can capture power law degree distribution, global sparsity and non-vanishing local clustering coefficient. This distribution can be used directly as a prior on observed graphs, or as part of a hierarchical Bayesian model for inferring latent graph structures.' volume: 115 URL: https://proceedings.mlr.press/v115/williamson20a.html PDF: http://proceedings.mlr.press/v115/williamson20a/williamson20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-williamson20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Sinead A. family: Williamson - given: Mauricio family: Tec editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 228-238 id: williamson20a issued: date-parts: - 2020 - 8 - 6 firstpage: 228 lastpage: 238 published: 2020-08-06 00:00:00 +0000 - title: 'Randomized Iterative Algorithms for Fisher Discriminant Analysis' abstract: 'Fisher discriminant analysis (FDA) is a widely used method for classification and dimensionality reduction. When the number of predictor variables greatly exceeds the number of observations, one of the alternatives for conventional FDA is regularized Fisher discriminant analysis (RFDA). In this paper, we present a simple, iterative, sketching-based algorithm for RFDA that comes with provable accuracy guarantees when compared to the conventional approach. Our analysis builds upon two simple structural results that boil down to randomized matrix multiplication, a fundamental and well-understood primitive of randomized linear algebra. We analyze the behavior of RFDA when leverage scores and ridge leverage scores are used to select predictor variables, and prove that accurate approximations can be achieved by a sample whose size depends on the effective degrees of freedom of the RFDA problem. Our results yield significant improvements over existing approaches and our empirical evaluations support our theoretical analyses.' volume: 115 URL: https://proceedings.mlr.press/v115/chowdhury20a.html PDF: http://proceedings.mlr.press/v115/chowdhury20a/chowdhury20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-chowdhury20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Agniva family: Chowdhury - given: Jiasen family: Yang - given: Petros family: Drineas editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 239-249 id: chowdhury20a issued: date-parts: - 2020 - 8 - 6 firstpage: 239 lastpage: 249 published: 2020-08-06 00:00:00 +0000 - title: 'Dynamic Trip-Vehicle Dispatch with Scheduled and On-Demand Requests' abstract: 'Transportation service providers that dispatch drivers and vehicles to riders start to support both on-demand ride requests posted in real time and rides scheduled in advance, leading to new challenges which, to the best of our knowledge, have not been addressed by existing works. To fill the gap, we design novel trip-vehicle dispatch algorithms to handle both types of requests while taking into account an estimated request distribution of on-demand requests. At the core of the algorithms is the newly proposed Constrained Spatio-Temporal value function (CST-function), which is polynomial-time computable and represents the expected value a vehicle could gain with the constraint that it needs to arrive at a specific location at a given time. Built upon CST-function, we design a randomized best-fit algorithm for scheduled requests and an online planning algorithm for on-demand requests given the scheduled requests as constraints. We evaluate the algorithms through extensive experiments on a real-world dataset of an online ride-hailing platform.' volume: 115 URL: https://proceedings.mlr.press/v115/huang20a.html PDF: http://proceedings.mlr.press/v115/huang20a/huang20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-huang20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Taoan family: Huang - given: Bohui family: Fang - given: Xiaohui family: Bei - given: Fei family: Fang editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 250-260 id: huang20a issued: date-parts: - 2020 - 8 - 6 firstpage: 250 lastpage: 260 published: 2020-08-06 00:00:00 +0000 - title: 'Fall of Empires: Breaking Byzantine-tolerant SGD by Inner Product Manipulation' abstract: 'Recently, new defense techniques have been developed to tolerate Byzantine failures for distributed machine learning. The Byzantine model captures workers that behave arbitrarily, including malicious and compromised workers. In this paper, we break two prevailing Byzantine-tolerant techniques. Specifically we show that two robust aggregation methods for synchronous SGD–namely, coordinate-wise median and Krum–can be broken using new attack strategies based on inner product manipulation. We prove our results theoretically, as well as show empirical validation. ' volume: 115 URL: https://proceedings.mlr.press/v115/xie20a.html PDF: http://proceedings.mlr.press/v115/xie20a/xie20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-xie20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Cong family: Xie - given: Oluwasanmi family: Koyejo - given: Indranil family: Gupta editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 261-270 id: xie20a issued: date-parts: - 2020 - 8 - 6 firstpage: 261 lastpage: 270 published: 2020-08-06 00:00:00 +0000 - title: 'Adaptive Hashing for Model Counting' abstract: 'Randomized hashing algorithms have seen recent success in providing bounds on the model count of a propositional formula. These methods repeatedly check the satisfiability of a formula subject to increasingly stringent random constraints. Key to these approaches is the choice of a fixed family of hash functions that strikes a good balance between computational efficiency and statistical guarantees for a hypothetical worst case formula. In this paper we propose a scheme where the family of hash functions is chosen adaptively, based on properties of the specific input formula. Akin to adaptive importance sampling, we use solutions to the formula (found during the bounding procedure of current methods) to estimate properties of the solution set, which guides the construction of random constraints. Additionally, we introduce an orthogonal variance reduction technique that is broadly applicable to hashing based methods. We empirically show that, when combined, these approaches lead to better lower bounds on existing benchmarks, with a median improvement factor of 2^13 over 1,198 propositional formulas.' volume: 115 URL: https://proceedings.mlr.press/v115/kuck20a.html PDF: http://proceedings.mlr.press/v115/kuck20a/kuck20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-kuck20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Jonathan family: Kuck - given: Tri family: Dao - given: Shengjia family: Zhao - given: Burak family: Bartan - given: Ashish family: Sabharwal - given: Stefano family: Ermon editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 271-280 id: kuck20a issued: date-parts: - 2020 - 8 - 6 firstpage: 271 lastpage: 280 published: 2020-08-06 00:00:00 +0000 - title: 'Towards a Better Understanding and Regularization of GAN Training Dynamics' abstract: 'Generative adversarial networks (GANs) are notoriously difficult to train and the reasons underlying their (non-)convergence behaviors are still not completely understood. By first considering a simple yet representative GAN example, we mathematically analyze its local convergence behavior in a non-asymptotic way. Furthermore, the analysis is extended to general GANs under certain assumptions. We find that in order to ensure a good convergence rate, two factors of the Jacobian in the GAN training dynamics should be simultaneously avoided, which are (i) the Phase Factor, i.e., the Jacobian has complex eigenvalues with a large imaginary-to-real ratio, and (ii) the Conditioning Factor, i.e., the Jacobian is ill-conditioned. Previous methods of regularizing the Jacobian can only alleviate one of these two factors, while making the other more severe. Thus we propose a new JAcobian REgularization (JARE) for GANs, which simultaneously addresses both factors by construction. Finally, we conduct experiments that confirm our theoretical analysis and demonstrate the advantages of JARE over previous methods in stabilizing GANs.' volume: 115 URL: https://proceedings.mlr.press/v115/nie20a.html PDF: http://proceedings.mlr.press/v115/nie20a/nie20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-nie20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Weili family: Nie - given: Ankit B. family: Patel editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 281-291 id: nie20a issued: date-parts: - 2020 - 8 - 6 firstpage: 281 lastpage: 291 published: 2020-08-06 00:00:00 +0000 - title: 'Domain Generalization via Multidomain Discriminant Analysis' abstract: 'Domain generalization (DG) aims to incorporate knowledge from multiple source domains into a single model that could generalize well on unseen target domains. This problem is ubiquitous in practice since the distributions of the target data may rarely be identical to those of the source data. In this paper, we propose Multidomain Discriminant Analysis (MDA) to address DG of classification tasks in general situations. MDA learns a domain-invariant feature transformation that aims to achieve appealing properties, including a minimal divergence among domains within each class, a maximal separability among classes, and overall maximal compactness of all classes. Furthermore, we provide the bounds on excess risk and generalization error by learning theory analysis. Comprehensive experiments on synthetic and real benchmark datasets demonstrate the effectiveness of MDA.' volume: 115 URL: https://proceedings.mlr.press/v115/hu20a.html PDF: http://proceedings.mlr.press/v115/hu20a/hu20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-hu20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Shoubo family: Hu - given: Kun family: Zhang - given: Zhitang family: Chen - given: Laiwan family: Chan editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 292-302 id: hu20a issued: date-parts: - 2020 - 8 - 6 firstpage: 292 lastpage: 302 published: 2020-08-06 00:00:00 +0000 - title: 'Efficient Planning Under Uncertainty with Incremental Refinement' abstract: 'Online planning under uncertainty on robots and similar agents has very strict performance requirements in order to achieve reasonable behavior in complex domains with limited resources. The underlying process of decision-making and information gathering is correctly modeled by POMDP’s, but their complexity makes many interesting and challenging problems virtually intractable. We address this issue by introducing a method to estimate relevance values for elements of a planning domain, that allow an agent to focus on promising features. This approach reduces the effective dimensionality of problems, allowing an agent to plan faster and collect higher rewards. Experimental validation was performed on two challenging POMDP’s that resemble real-world robotic task planning, where it is crucial to interleave planning and acting in an efficient manner.' volume: 115 URL: https://proceedings.mlr.press/v115/saborio20a.html PDF: http://proceedings.mlr.press/v115/saborio20a/saborio20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-saborio20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Juan Carlos family: Saborío - given: Joachim family: Hertzberg editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 303-312 id: saborio20a issued: date-parts: - 2020 - 8 - 6 firstpage: 303 lastpage: 312 published: 2020-08-06 00:00:00 +0000 - title: 'Cubic Regularization with Momentum for Nonconvex Optimization' abstract: 'Momentum is a popular technique to accelerate the convergence in practical training, and its impact on convergence guarantee has been well-studied for first-order algorithms. However, such a successful acceleration technique has not yet been proposed for second-order algorithms in nonconvex optimization. In this paper, we apply the momentum scheme to cubic regularized (CR) Newton’s method and explore the potential for acceleration. Our numerical experiments on various nonconvex optimization problems demonstrate that the momentum scheme can substantially facilitate the convergence of cubic regularization, and perform even better than the Nesterov’s acceleration scheme for CR. Theoretically, we prove that CR under momentum achieves the best possible convergence rate to a second-order stationary point for nonconvex optimization. Moreover, we study the proposed algorithm for solving problems satisfying an error bound condition and establish a local quadratic convergence rate. Then, particularly for finite-sum problems, we show that the proposed algorithm can allow computational inexactness that reduces the overall sample complexity without degrading the convergence rate.' volume: 115 URL: https://proceedings.mlr.press/v115/wang20c.html PDF: http://proceedings.mlr.press/v115/wang20c/wang20c.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-wang20c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Zhe family: Wang - given: Yi family: Zhou - given: Yingbin family: Liang - given: Guanghui family: Lan editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 313-322 id: wang20c issued: date-parts: - 2020 - 8 - 6 firstpage: 313 lastpage: 322 published: 2020-08-06 00:00:00 +0000 - title: 'Stability of Linear Structural Equation Models of Causal Inference' abstract: 'We consider numerical stability of the parameter recovery problem in Linear Structural Equation Model (LSEM) of causal inference. Numerical stability is essential for the recovered parameters to be reliable. A long line of work starting from Wright (1920) has focused on understanding which sub-classes of LSEM allow for efficient parameter recovery. Despite decades of study, this question is not yet fully resolved. The goal of the present paper is complementary to this line of work: we want to understand the stability of the recovery problem in the cases when efficient recovery is possible. Numerical stability of Pearl’s notion of causality was first studied in Schulman and Srivastava (2016) using the concept of condition number where they provide ill-conditioned examples. In this work, we provide a condition number analysis for the LSEM. First we prove that under a sufficient condition, for a certain sub-class of LSEM that are bow-free (Brito and Pearl (2002)), parameter recovery is numerically stable. We further prove that randomly chosen input parameters for this family satisfy the condition with a substantial probability. Hence for this family, on a large subset of parameter space, recovery is stable. Next we construct an example of LSEM on four vertices with unbounded condition number. We then corroborate our theoretical findings via simulations as well as real-world experiments for a sociology application. Finally, we provide a general heuristic for estimating the condition number of any LSEM instance. ' volume: 115 URL: https://proceedings.mlr.press/v115/sankararaman20a.html PDF: http://proceedings.mlr.press/v115/sankararaman20a/sankararaman20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-sankararaman20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Karthik Abhinav family: Sankararaman - given: Anand family: Louis - given: Navin family: Goyal editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 323-333 id: sankararaman20a issued: date-parts: - 2020 - 8 - 6 firstpage: 323 lastpage: 333 published: 2020-08-06 00:00:00 +0000 - title: 'Random Sum-Product Networks: A Simple and Effective Approach to Probabilistic Deep Learning' abstract: 'Sum-product networks (SPNs) are expressive probabilistic models with a rich set of exact and efficient inference routines. However, in order to guarantee exact inference, they require specific structural constraints, which complicate learning SPNs from data. Thereby, most SPN structure learners proposed so far are tedious to tune, do not scale easily, and are not easily integrated with deep learning frameworks. In this paper, we follow a simple “deep learning” approach, by generating unspecialized random structures, scalable to millions of parameters, and subsequently applying GPU-based optimization. Somewhat surprisingly, our models often perform on par with state-of-the-art SPN structure learners and deep neural networks on a diverse range of generative and discriminative scenarios. At the same time, our models yield well-calibrated uncertainties, and stand out among most deep generative and discriminative models in being robust to missing features and being able to detect anomalies.' volume: 115 URL: https://proceedings.mlr.press/v115/peharz20a.html PDF: http://proceedings.mlr.press/v115/peharz20a/peharz20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-peharz20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Robert family: Peharz - given: Antonio family: Vergari - given: Karl family: Stelzner - given: Alejandro family: Molina - given: Xiaoting family: Shao - given: Martin family: Trapp - given: Kristian family: Kersting - given: Zoubin family: Ghahramani editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 334-344 id: peharz20a issued: date-parts: - 2020 - 8 - 6 firstpage: 334 lastpage: 344 published: 2020-08-06 00:00:00 +0000 - title: 'Towards Robust Relational Causal Discovery' abstract: 'We consider the problem of learning causal relationships from relational data. Existing approaches rely on queries to a relational conditional independence (RCI) oracle to establish and orient causal relations in such a setting. In practice, queries to a RCI oracle have to be replaced by reliable tests for RCI against available data. Relational data present several unique challenges in testing for RCI. We study the conditions under which traditional iid-based CI tests yield reliable answers to RCI queries against relational data. We show how to conduct CI tests against relational data to robustly recover the underlying relational causal structure. Results of our experiments demonstrate the effectiveness of our proposed approach.' volume: 115 URL: https://proceedings.mlr.press/v115/lee20a.html PDF: http://proceedings.mlr.press/v115/lee20a/lee20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-lee20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Sanghack family: Lee - given: Vasant family: Honavar editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 345-355 id: lee20a issued: date-parts: - 2020 - 8 - 6 firstpage: 345 lastpage: 355 published: 2020-08-06 00:00:00 +0000 - title: 'The Role of Memory in Stochastic Optimization' abstract: 'The choice of how to retain information about past gradients dramatically affects the convergence properties of state-of-the-art stochastic optimization methods, such as Heavy-ball, Nesterov’s momentum, RMSprop and Adam. Building on this observation, we use stochastic differential equations (SDEs) to explicitly study the role of memory in gradient-based algorithms. We first derive a general continuous-time model that can incorporate arbitrary types of memory, for both deterministic and stochastic settings. We provide convergence guarantees for this SDE for weakly-quasi-convex and quadratically growing functions. We then demonstrate how to discretize this SDE to get a flexible discrete-time algorithm that can implement a board spectrum of memories ranging from short- to long-term. Not only does this algorithm increase the degrees of freedom in algorithmic choice for practitioners but it also comes with better stability properties than classical momentum in the convex stochastic setting. In particular, no iterate averaging is needed for convergence. Interestingly, our analysis also provides a novel interpretation of Nesterov’s momentum as stable gradient amplification and highlights a possible reason for its unstable behavior in the (convex) stochastic setting. Furthermore, we discuss the use of long term memory for second-moment estimation in adaptive methods, such as Adam and RMSprop. Finally, we provide an extensive experimental study of the effect of different types of memory in both convex and nonconvex settings.' volume: 115 URL: https://proceedings.mlr.press/v115/orvieto20a.html PDF: http://proceedings.mlr.press/v115/orvieto20a/orvieto20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-orvieto20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Antonio family: Orvieto - given: Jonas family: Kohler - given: Aurelien family: Lucchi editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 356-366 id: orvieto20a issued: date-parts: - 2020 - 8 - 6 firstpage: 356 lastpage: 366 published: 2020-08-06 00:00:00 +0000 - title: 'Random Search and Reproducibility for Neural Architecture Search' abstract: 'Neural architecture search (NAS) is a promising research direction that has the potential to replace expert-designed networks with learned, task-specific architectures. In order to help ground the empirical results in this field, we propose new NAS baselines that build off the following observations: (i) NAS is a specialized hyperparameter optimization problem; and (ii) random search is a competitive baseline for hyperparameter optimization. Leveraging these observations, we evaluate both random search with early-stopping and a novel random search with weight-sharing algorithm on two standard NAS benchmarks—PTB and CIFAR-10. Our results show that random search with early-stopping is a competitive NAS baseline, e.g., it performs at least as well as ENAS, a leading NAS method, on both benchmarks. Additionally, random search with weight-sharing outperforms random search with early-stopping, achieving a state-of-the-art NAS result on PTB and a highly competitive result on CIFAR-10. Finally, we explore the existing reproducibility issues of published NAS results.' volume: 115 URL: https://proceedings.mlr.press/v115/li20c.html PDF: http://proceedings.mlr.press/v115/li20c/li20c.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-li20c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Liam family: Li - given: Ameet family: Talwalkar editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 367-377 id: li20c issued: date-parts: - 2020 - 8 - 6 firstpage: 367 lastpage: 377 published: 2020-08-06 00:00:00 +0000 - title: 'Joint Nonparametric Precision Matrix Estimation with Confounding' abstract: 'We consider the problem of precision matrix estimation where, due to extraneous confounding of the underlying precision matrix, the data are independent but not identically distributed. While such confounding occurs in many scientific problems, our approach is inspired by recent neuroscientific research suggesting that brain function, as measured using functional magnetic resonance imagine (fMRI), is susceptible to confounding by physiological noise such as breathing and subject motion. Following the scientific motivation, we propose a graphical model, which in turn motivates a joint nonparametric estimator. We provide theoretical guarantees for the consistency and the convergence rate of the proposed estimator. In addition, we demonstrate that the optimization of the proposed estimator can be transformed into a series of linear programming problems, and thus be efficiently solved in parallel. Empirical results are presented using simulated and real brain imaging data, which suggest that our approach improves precision matrix estimation, as compared to baselines, when confounding is present.' volume: 115 URL: https://proceedings.mlr.press/v115/geng20a.html PDF: http://proceedings.mlr.press/v115/geng20a/geng20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-geng20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Sinong family: Geng - given: Mladen family: Kolar - given: Oluwasanmi family: Koyejo editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 378-388 id: geng20a issued: date-parts: - 2020 - 8 - 6 firstpage: 378 lastpage: 388 published: 2020-08-06 00:00:00 +0000 - title: 'General Identifiability with Arbitrary Surrogate Experiments' abstract: 'We study the problem of causal identification from an arbitrary collection of observational and experimental distributions, and substantive knowledge about the phenomenon under investigation, which usually comes in the form of a causal graph. We call this problem \textit{g-identifiability}, or gID for short. The gID setting encompasses two well-known problems in causal inference, namely, identifiability [Pearl, 1995] and z-identifiability [Bareinboim and Pearl, 2012] – the former assumes that an observational distribution is necessarily available, and no experiments can be performed, conditions that are both relaxed in the gID setting; the latter assumes that \textit{all} combinations of experiments are available, i.e., the power set of the experimental set $\mathbf{Z}$, which gID does not require a priori. In this paper, we introduce a general strategy to prove non-gID based on \textit{hedgelets} and \textit{thickets}, which leads to a necessary and sufficient graphical condition for the corresponding decision problem. We further develop a procedure for systematically computing the target effect, and prove that it is sound and complete for gID instances. In other words, failure of the algorithm in returning an expression implies that the target effect is not computable from the available distributions. Finally, as a corollary of these results, we show that do-calculus is complete for the task of g-identifiability.' volume: 115 URL: https://proceedings.mlr.press/v115/lee20b.html PDF: http://proceedings.mlr.press/v115/lee20b/lee20b.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-lee20b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Sanghack family: Lee - given: Juan D. family: Correa - given: Elias family: Bareinboim editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 389-398 id: lee20b issued: date-parts: - 2020 - 8 - 6 firstpage: 389 lastpage: 398 published: 2020-08-06 00:00:00 +0000 - title: 'Differentiable Probabilistic Models of Scientific Imaging with the Fourier Slice Theorem' abstract: 'Scientific imaging techniques such as optical and electron microscopy and computed tomography (CT) scanning are used to study the 3D structure of an object through 2D observations. These observations are related to the original 3D object through orthogonal integral projections. For common 3D reconstruction algorithms, computational efficiency requires the modeling of the 3D structures to take place in Fourier space by applying the Fourier slice theorem. At present, it is unclear how to differentiate through the projection operator, and hence current learning algorithms can not rely on gradient based methods to optimize 3D structure models. In this paper we show how back-propagation through the projection operator in Fourier space can be achieved. We demonstrate the validity of the approach with experiments on 3D reconstruction of proteins. We further extend our approach to learning probabilistic models of 3D objects. This allows us to predict regions of low sampling rates or estimate noise. A higher sample efficiency can be reached by utilizing the learned uncertainties of the 3D structure as an unsupervised estimate of the model fit. Finally, we demonstrate how the reconstruction algorithm can be extended with an amortized inference scheme on unknown attributes such as object pose. Through empirical studies we show that joint inference of the 3D structure and the object pose becomes more difficult when the ground truth object contains more symmetries. Due to the presence of for instance (approximate) rotational symmetries, the pose estimation can easily get stuck in local optima, inhibiting a fine-grained high-quality estimate of the 3D structure. ' volume: 115 URL: https://proceedings.mlr.press/v115/ullrich20a.html PDF: http://proceedings.mlr.press/v115/ullrich20a/ullrich20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-ullrich20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Karen family: Ullrich - given: Rianne prefix: van den family: Berg - given: Marcus family: Brubaker - given: David family: Fleet - given: Max family: Welling editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 399-411 id: ullrich20a issued: date-parts: - 2020 - 8 - 6 firstpage: 399 lastpage: 411 published: 2020-08-06 00:00:00 +0000 - title: 'Approximate Inference in Structured Instances with Noisy Categorical Observations' abstract: 'We study the problem of recovering the latent ground truth labeling of a structured instance with categorical random variables in the presence of noisy observations. We present a new approximate algorithm for graphs with categorical variables that achieves low Hamming error in the presence of noisy vertex and edge observations. Our main result shows a logarithmic dependency of the Hamming error to the number of categories of the random variables. Our approach draws connections to correlation clustering with a fixed number of clusters. Our results generalize the works of Globerson et al. (2015) and Foster et al. (2018), who study the hardness of structured prediction under binary labels, to the case of categorical labels.' volume: 115 URL: https://proceedings.mlr.press/v115/heidari20a.html PDF: http://proceedings.mlr.press/v115/heidari20a/heidari20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-heidari20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Alireza family: Heidari - given: Ihab F. family: Ilyas - given: Theodoros family: Rekatsinas editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 412-421 id: heidari20a issued: date-parts: - 2020 - 8 - 6 firstpage: 412 lastpage: 421 published: 2020-08-06 00:00:00 +0000 - title: 'Randomized Value Functions via Multiplicative Normalizing Flows' abstract: 'Randomized value functions offer a promising approach towards the challenge of efficient exploration in complex environments with high dimensional state and action spaces. Unlike traditional point estimate methods, randomized value functions maintain a posterior distribution over action-space values. This prevents the agent’s behavior policy from prematurely exploiting early estimates and falling into local optima. In this work, we leverage recent advances in variational Bayesian neural networks and combine these with traditional Deep Q-Networks (DQN) and Deep Deterministic Policy Gradient (DDPG) to achieve randomized value functions for high-dimensional domains. In particular, we augment DQN and DDPG with multiplicative normalizing flows in order to track a rich approximate posterior distribution over the parameters of the value function. This allows the agent to perform approximate Thompson sampling in a computationally efficient manner via stochastic gradient methods. We demonstrate the benefits of our approach through an empirical comparison in high dimensional environments. ' volume: 115 URL: https://proceedings.mlr.press/v115/touati20a.html PDF: http://proceedings.mlr.press/v115/touati20a/touati20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-touati20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Ahmed family: Touati - given: Harsh family: Satija - given: Joshua family: Romoff - given: Joelle family: Pineau - given: Pascal family: Vincent editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 422-432 id: touati20a issued: date-parts: - 2020 - 8 - 6 firstpage: 422 lastpage: 432 published: 2020-08-06 00:00:00 +0000 - title: 'A Fast Proximal Point Method for Computing Exact Wasserstein Distance' abstract: 'Wasserstein distance plays increasingly important roles in machine learning, stochastic programming and image processing. Major efforts have been under way to address its high computational complexity, some leading to approximate or regularized variations such as Sinkhorn distance. However, as we will demonstrate, regularized variations with large regularization parameter will degradate the performance in several important machine learning applications, and small regularization parameter will fail due to numerical stability issues with existing algorithms. We address this challenge by developing an Inexact Proximal point method for exact Optimal Transport problem (IPOT) with the proximal operator approximately evaluated at each iteration using projections to the probability simplex. The algorithm (a) converges to exact Wasserstein distance with theoretical guarantee and robust regularization parameter selection, (b) alleviates numerical stability issue, (c) has similar computational complexity to Sinkhorn, and (d) avoids the shrinking problem when applies to generative models. Furthermore, a new algorithm is proposed based on IPOT to obtain sharper Wasserstein barycenter.' volume: 115 URL: https://proceedings.mlr.press/v115/xie20b.html PDF: http://proceedings.mlr.press/v115/xie20b/xie20b.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-xie20b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Yujia family: Xie - given: Xiangfeng family: Wang - given: Ruijia family: Wang - given: Hongyuan family: Zha editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 433-453 id: xie20b issued: date-parts: - 2020 - 8 - 6 firstpage: 433 lastpage: 453 published: 2020-08-06 00:00:00 +0000 - title: 'Neural Dynamics Discovery via Gaussian Process Recurrent Neural Networks' abstract: 'Latent dynamics discovery is challenging in extracting complex dynamics from highdimensional noisy neural data. Many dimensionality reduction methods have been widely adopted to extract low-dimensional, smooth and time-evolving latent trajectories. However, simple state transition structures, linear embedding assumptions, or inflexible inference networks impede the accurate recovery of dynamic portraits. In this paper, we propose a novel latent dynamic model that is capable of capturing nonlinear, non- Markovian, long short-term time-dependent dynamics via recurrent neural networks and tackling complex nonlinear embedding via non-parametric Gaussian process. Due to the complexity and intractability of the model and its inference, we also provide a powerful inference network with bi-directional long short-term memory networks that encode both past and future information into posterior distributions. In the experiment, we show that our model outperforms other state-of-the-art methods in reconstructing insightful latent dynamics from both simulated and experimental neural datasets with either Gaussian or Poisson observations, especially in the low-sample scenario. Our codes and additional materials are available at https://github.com/sheqi/GP-RNN_UAI2019.' volume: 115 URL: https://proceedings.mlr.press/v115/she20a.html PDF: http://proceedings.mlr.press/v115/she20a/she20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-she20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Qi family: She - given: Anqi family: Wu editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 454-464 id: she20a issued: date-parts: - 2020 - 8 - 6 firstpage: 454 lastpage: 464 published: 2020-08-06 00:00:00 +0000 - title: 'Fisher-Bures Adversary Graph Convolutional Networks' abstract: 'In a graph convolutional network, we assume that the graph $G$ is generated wrt some observation noise. During learning, we make small random perturbations $\Delta{}G$ of the graph and try to improve generalization. Based on quantum information geometry, $\Delta{}G$ can be characterized by the eigendecomposition of the graph Laplacian matrix. We try to minimize the loss wrt the perturbed $G+\Delta{G}$ while making $\Delta{G}$ to be effective in terms of the Fisher information of the neural network. Our proposed model can consistently improve graph convolutional networks on semi-supervised node classification tasks with reasonable computational overhead. We present three different geometries on the manifold of graphs: the intrinsic geometry measures the information theoretic dynamics of a graph; the extrinsic geometry characterizes how such dynamics can affect externally a graph neural network; the embedding geometry is for measuring node embeddings. These new analytical tools are useful in developing a good understanding of graph neural networks and fostering new techniques.' volume: 115 URL: https://proceedings.mlr.press/v115/sun20a.html PDF: http://proceedings.mlr.press/v115/sun20a/sun20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-sun20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Ke family: Sun - given: Piotr family: Koniusz - given: Zhen family: Wang editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 465-475 id: sun20a issued: date-parts: - 2020 - 8 - 6 firstpage: 465 lastpage: 475 published: 2020-08-06 00:00:00 +0000 - title: 'Epsilon-BMC: A Bayesian Ensemble Approach to Epsilon-Greedy Exploration in Model-Free Reinforcement Learning' abstract: 'Resolving the exploration-exploitation trade-off remains a fundamental problem in the design and implementation of reinforcement learning (RL) algorithms. In this paper, we focus on model-free RL using the epsilon-greedy exploration policy, which despite its simplicity, remains one of the most frequently used forms of exploration. However, a key limitation of this policy is the specification of epsilon. In this paper, we provide a novel Bayesian perspective of epsilon as a measure of the uncertainty (and hence convergence) in the Q-value function. We introduce a closed-form Bayesian model update based on Bayesian model combination (BMC), based on this new perspective, which allows us to adapt epsilon using experiences from the environment in constant time with monotone convergence guarantees. We demonstrate that our proposed algorithm, epsilon-BMC, efficiently balances exploration and exploitation on different problems, performing comparably or outperforming the best tuned fixed annealing schedules and an alternative data-dependent epsilon adaptation scheme proposed in the literature.' volume: 115 URL: https://proceedings.mlr.press/v115/gimelfarb20a.html PDF: http://proceedings.mlr.press/v115/gimelfarb20a/gimelfarb20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-gimelfarb20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Michael family: Gimelfarb - given: Scott family: Sanner - given: Chi-Guhn family: Lee editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 476-485 id: gimelfarb20a issued: date-parts: - 2020 - 8 - 6 firstpage: 476 lastpage: 485 published: 2020-08-06 00:00:00 +0000 - title: 'Periodic Kernel Approximation by Index Set Fourier Series Features' abstract: 'Periodicity is often studied in timeseries modelling with autoregressive methods but is less popular in the kernel literature, particularly for multi-dimensional problems such as in textures, crystallography, quantum mechanics, and robotics. Large datasets often make modelling periodicity untenable for otherwise powerful non-parametric methods like Gaussian Processes (GPs) which typically incur an $\mathcal{O}(N^3)$ computational cost, while approximate feature methods are impeded by their approximate accuracy. We introduce a method that efficiently decomposes multi-dimensional periodic kernels into a set of basis functions by exploiting multivariate Fourier series. Termed \emph{Index Set Fourier Series Features}, we show that our approximation produces significantly less predictive generalisation error than alternative approximations such as those based on random and deterministic Fourier features on regression problems with periodic data. ' volume: 115 URL: https://proceedings.mlr.press/v115/tompkins20a.html PDF: http://proceedings.mlr.press/v115/tompkins20a/tompkins20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-tompkins20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Anthony family: Tompkins - given: Fabio family: Ramos editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 486-496 id: tompkins20a issued: date-parts: - 2020 - 8 - 6 firstpage: 486 lastpage: 496 published: 2020-08-06 00:00:00 +0000 - title: 'Efficient Neural Network Verification with Exactness Characterization' abstract: 'Remarkable progress has been made on verification of neural networks, i.e., showing that neural networks are provably consistent with specifications encoding properties like adversarial robustness. Recent methods developed for scalable neural network verification are based on computing an upper bound on the worst-case violation of the specification. Semidefinite programming (SDP) has been proposed as a means to obtain tight upper bounds. However, SDP solvers do not scale to large neural networks. We introduce a Lagrangian relaxation based on the SDP formulation and a novel algorithm to solve the relaxation that scales to networks that are two orders of magnitude larger than the off-the-shelf SDP solvers. Although verification of neural networks is known to be NP-hard in general, we develop the first known sufficient conditions under which a polynomial time verification algorithm (based on the above relaxation) is guaranteed to perform exact verification (i.e., either verify a property or establish it is untrue). The algorithm can be implemented using primitives available readily in common deep learning frameworks. Experiments show that the algorithm is fast, and is able to compute tight upper bounds on the error rates under adversarial attacks of convolutional networks trained on MNIST and CIFAR-10.' volume: 115 URL: https://proceedings.mlr.press/v115/dvijotham20a.html PDF: http://proceedings.mlr.press/v115/dvijotham20a/dvijotham20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-dvijotham20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Krishnamurthy (Dj) family: Dvijotham - given: Robert family: Stanforth - given: Sven family: Gowal - given: Chongli family: Qin - given: Soham family: De - given: Pushmeet family: Kohli editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 497-507 id: dvijotham20a issued: date-parts: - 2020 - 8 - 6 firstpage: 497 lastpage: 507 published: 2020-08-06 00:00:00 +0000 - title: 'Augmenting and Tuning Knowledge Graph Embeddings' abstract: 'Knowledge graph embeddings rank among the most successful methods for link prediction in knowledge graphs, i.e., the task of completing an incomplete collection of relational facts. A downside of these models is their strong sensitivity to model hyperparameters, in particular regularizers, which have to be extensively tuned to reach good performance [Kadlec et al., 2017]. We propose an efficient method for large scale hyperparameter tuning by interpreting these models in a probabilistic framework. After a model augmentation that introduces per-entity hyperparameters, we use a variational expectation-maximization approach to tune thousands of such hyperparameters with minimal additional cost. Our approach is agnostic to details of the model and results in a new state of the art in link prediction on standard benchmark data.' volume: 115 URL: https://proceedings.mlr.press/v115/bamler20a.html PDF: http://proceedings.mlr.press/v115/bamler20a/bamler20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-bamler20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Robert family: Bamler - given: Farnood family: Salehi - given: Stephan family: Mandt editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 508-518 id: bamler20a issued: date-parts: - 2020 - 8 - 6 firstpage: 508 lastpage: 518 published: 2020-08-06 00:00:00 +0000 - title: 'A Tighter Analysis of Randomised Policy Iteration' abstract: 'Policy Iteration (PI) is a popular family of algorithms to compute an optimal policy for a givenMarkov Decision Problem (MDP). Starting from an arbitrary initial policy, PI repeatedly performs locally-improving switches until an optimal policy is found. The exact form of the switching rule gives rise to different variants of PI. Two decades ago, Mansour and Singh [1999] provided the first non-trivial “strong” upper bound on the number of iterations taken by “Howard’s PI” (HPI), a widelyused variant of PI (strong bounds depend only on the number of states and actions in the MDP). They also proposed a randomised variant (RPI) and showed an even tighter strong upper bound. Their bounds for HPI and RPI have not been improved subsequently.\\{We} revisit the algorithms and analysis of Mansour and Singh [1999]. We prove a novel result on the structure of the policy space for k-action MDPs, $k\geq 2$, which generalises a result known for k = 2. Also proposing a new counting argument, we obtain a strong bound of (O$(\sqrt{k \log k}))^n$ iterations for an algorithm akin to RPI, improving significantly upon Mansour and Singh’s original bound of roughly O($(k/2)^n$). Similar analysis of a randomised variant of HPI also yields a strong upper bound of (O($\sqrt{k \log k}))^n$ iterations, registering the first exponential improvement for HPI over the trivial bound of $k^n$. Our other contributions include a lower bound of $\Omega(n)$ iterations for RPI and an upper bound of $1.6001^n$ iterations for a randomised variant of “Batch-Switching PI” [Kalyanakrishnan et al., 2016a] on 2-action MDPs—the tightest strong upper bound shown yet for the PI family.' volume: 115 URL: https://proceedings.mlr.press/v115/taraviya20a.html PDF: http://proceedings.mlr.press/v115/taraviya20a/taraviya20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-taraviya20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Meet family: Taraviya - given: Shivaram family: Kalyanakrishnan editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 519-529 id: taraviya20a issued: date-parts: - 2020 - 8 - 6 firstpage: 519 lastpage: 529 published: 2020-08-06 00:00:00 +0000 - title: 'Perturbed-History Exploration in Stochastic Linear Bandits' abstract: 'We propose a new online algorithm for cumulative regret minimization in a stochastic linear bandit. The algorithm pulls the arm with the highest estimated reward in a linear model trained on its perturbed history. Therefore, we call it perturbed-history exploration in a linear bandit (LinPHE). The perturbed history is a mixture of observed rewards and randomly generated i.i.d. pseudo-rewards. We derive a $\tilde{O}(d \sqrt{n})$ gap-free bound on the $n$-round regret of LinPHE, where $d$ is the number of features. The key steps in our analysis are new concentration and anti-concentration bounds on the weighted sum of Bernoulli random variables. To show the generality of our design, we generalize LinPHE to a logistic model. We evaluate our algorithms empirically and show that they are practical.' volume: 115 URL: https://proceedings.mlr.press/v115/kveton20a.html PDF: http://proceedings.mlr.press/v115/kveton20a/kveton20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-kveton20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Branislav family: Kveton - given: Csaba family: Szepesvári - given: Mohammad family: Ghavamzadeh - given: Craig family: Boutilier editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 530-540 id: kveton20a issued: date-parts: - 2020 - 8 - 6 firstpage: 530 lastpage: 540 published: 2020-08-06 00:00:00 +0000 - title: 'An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient' abstract: 'We revisit the stochastic variance-reduced policy gradient (SVRPG) method proposed by \citet{papini2018stochastic} for reinforcement learning. We provide an improved convergence analysis of SVRPG and show that it can find an $\epsilon$-approximate stationary point of the performance function within $O(1/\epsilon^{5/3})$ trajectories. This sample complexity improves upon the best known result $O(1/\epsilon^2)$ by a factor of $O(1/\epsilon^{1/3})$. At the core of our analysis is (i) a tighter upper bound for the variance of importance sampling weights, where we prove that the variance can be controlled by the parameter distance between different policies; and (ii) a fine-grained analysis of the epoch length and batch size parameters such that we can significantly reduce the number of trajectories required in each iteration of SVRPG. We also empirically demonstrate the effectiveness of our theoretical claims of batch sizes on reinforcement learning benchmark tasks. ' volume: 115 URL: https://proceedings.mlr.press/v115/xu20a.html PDF: http://proceedings.mlr.press/v115/xu20a/xu20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-xu20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Pan family: Xu - given: Felicia family: Gao - given: Quanquan family: Gu editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 541-551 id: xu20a issued: date-parts: - 2020 - 8 - 6 firstpage: 541 lastpage: 551 published: 2020-08-06 00:00:00 +0000 - title: 'Deep Mixture of Experts via Shallow Embedding' abstract: 'Larger networks generally have greater representational power at the cost of increased computational complexity. Sparsifying such networks has been an active area of research but has been generally limited to static regularization or dynamic approaches using reinforcement learning. We explore a mixture of experts (MoE) approach to deep dynamic routing, which activates certain experts in the network on a per-example basis. Our novel DeepMoE architecture increases the representational power of standard convolutional networks by adaptively sparsifying and recalibrating channel-wise features in each convolutional layer. We employ a multi-headed sparse gating network to determine the selection and scaling of channels for each input, leveraging exponential combinations of experts within a single convolutional network. Our proposed architecture is evaluated on four benchmark datasets and tasks, and we show that Deep-MoEs are able to achieve higher accuracy with lower computation than standard convolutional networks.' volume: 115 URL: https://proceedings.mlr.press/v115/wang20d.html PDF: http://proceedings.mlr.press/v115/wang20d/wang20d.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-wang20d.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Xin family: Wang - given: Fisher family: Yu - given: Lisa family: Dunlap - given: Yi-An family: Ma - given: Ruth family: Wang - given: Azalia family: Mirhoseini - given: Trevor family: Darrell - given: Joseph E. family: Gonzalez editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 552-562 id: wang20d issued: date-parts: - 2020 - 8 - 6 firstpage: 552 lastpage: 562 published: 2020-08-06 00:00:00 +0000 - title: 'Sampling-Free Variational Inference of Bayesian Neural Networks by Variance Backpropagation' abstract: ' We propose a new Bayesian Neural Net formulation that affords variational inference for which the evidence lower bound is analytically tractable subject to a tight approximation. We achieve this tractability by (i) decomposing ReLU nonlinearities into the product of an identity and a Heaviside step function, (ii) introducing a separate path that decomposes the neural net expectation from its variance. We demonstrate formally that introducing separate latent binary variables to the activations allows representing the neural network likelihood as a chain of linear operations. Performing variational inference on this construction enables a sampling-free computation of the evidence lower bound which is a more effective approximation than the widely applied Monte Carlo sampling and CLT related techniques. We evaluate the model on a range of regression and classification tasks against BNN inference alternatives, showing competitive or improved performance over the current state-of-the-art. ' volume: 115 URL: https://proceedings.mlr.press/v115/haussmann20a.html PDF: http://proceedings.mlr.press/v115/haussmann20a/haussmann20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-haussmann20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Manuel family: Haußmann - given: Fred A. family: Hamprecht - given: Melih family: Kandemir editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 563-573 id: haussmann20a issued: date-parts: - 2020 - 8 - 6 firstpage: 563 lastpage: 573 published: 2020-08-06 00:00:00 +0000 - title: 'Sliced Score Matching: A Scalable Approach to Density and Score Estimation' abstract: 'Score matching is a popular method for estimating unnormalized statistical models. However, it has been so far limited to simple, shallow models or low-dimensional data, due to the difficulty of computing the Hessian of log-density functions. We show this difficulty can be mitigated by projecting the scores onto random vectors before comparing them. This objective, called sliced score matching, only involves Hessian-vector products, which can be easily implemented using reverse-mode automatic differentiation. Therefore, sliced score matching is amenable to more complex models and higher dimensional data compared to score matching. Theoretically, we prove the consistency and asymptotic normality of sliced score matching estimators. Moreover, we demonstrate that sliced score matching can be used to learn deep score estimators for implicit distributions. In our experiments, we show sliced score matching can learn deep energy-based models effectively, and can produce accurate score estimates for applications such as variational inference with implicit distributions and training Wasserstein Auto-Encoders.' volume: 115 URL: https://proceedings.mlr.press/v115/song20a.html PDF: http://proceedings.mlr.press/v115/song20a/song20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-song20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Yang family: Song - given: Sahaj family: Garg - given: Jiaxin family: Shi - given: Stefano family: Ermon editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 574-584 id: song20a issued: date-parts: - 2020 - 8 - 6 firstpage: 574 lastpage: 584 published: 2020-08-06 00:00:00 +0000 - title: 'Beyond Structural Causal Models: Causal Constraints Models' abstract: 'Structural Causal Models (SCMs) provide a popular causal modeling framework. In this work, we show that SCMs are not flexible enough to give a complete causal representation of dynamical systems at equilibrium. Instead, we propose a generalization of the notion of an SCM, that we call Causal Constraints Model (CCM), and prove that CCMs do capture the causal semantics of such systems. We show how CCMs can be constructed from differential equations and initial conditions and we illustrate our ideas further on a simple but ubiquitous (bio)chemical reaction. Our framework also allows to model functional laws, such as the ideal gas law, in a sensible and intuitive way.' volume: 115 URL: https://proceedings.mlr.press/v115/blom20a.html PDF: http://proceedings.mlr.press/v115/blom20a/blom20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-blom20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Tineke family: Blom - given: Stephan family: Bongers - given: Joris M. family: Mooij editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 585-594 id: blom20a issued: date-parts: - 2020 - 8 - 6 firstpage: 585 lastpage: 594 published: 2020-08-06 00:00:00 +0000 - title: 'Be Greedy: How Chromatic Number meets Regret Minimization in Graph Bandits' abstract: 'We study the classical linear bandit problem on \emph{graphs} modelling arm rewards through an underlying graph structure $G$($N$,$E$) such that rewards of neighboring nodes are similar. Previous attempts along this line have primarily considered the arm rewards to be a smooth function over graph Laplacian, which however failed to characterize the inherent problem complexity in terms of the graph structure. We bridge this gap by showing a regret guarantee of $\tO(\chi(\overline{G})\sqrt{T})$ ($\tO(\cdot)$ notation hides dependencies on $\log T$), that scales only with the chromatic number of the complement graph $\chi(\overline{G})$, assuming the rewards to be a smooth function over a general class of graph embeddings—\emph{Orthonormal Representations}. Our proposed algorithms yield a regret guarantee of $\tilde O(r\sqrt T)$ for any general embedding of rank $r$. Furthermore, if the rewards correspond to a minimum rank embedding, the regret boils down to $O(\chi(\overline{G})\sqrt{T})$—none of the existing works were able to bring out such influences of graph structures over arm rewards. Finally noting that computing the above minimum rank embedding is NP-Hard, we also propose an alternative $O(N + |E|)$ time computable embedding scheme—{\it Greedy Embeddings}—based on greedy graph coloring, on which our algorithms perform optimally on a large family of graphs, e.g. union of cliques, complement of $k$-colorable graphs, regular graphs etc, and are also shown to outperform the state-of-the-art methods on real datasets. Our findings open up new roads for exploiting graph structures on regret performance.' volume: 115 URL: https://proceedings.mlr.press/v115/s20a.html PDF: http://proceedings.mlr.press/v115/s20a/s20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-s20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Shreyas family: S - given: Aadirupa family: Saha - given: Chiranjib family: Bhattacharyya editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 595-605 id: s20a issued: date-parts: - 2020 - 8 - 6 firstpage: 595 lastpage: 605 published: 2020-08-06 00:00:00 +0000 - title: 'Approximate Causal Abstractions' abstract: 'Scientific models describe natural phenomena at different levels of abstraction. Abstract descriptions can provide the basis for interventions on the system and explanation of observed phenomena at a level of granularity that is coarser than the most fundamental account of the system. Beckers and Halpern (2019), building on prior work of Rubinstein et al. (2017), developed an account of abstraction for causal models that is exact. Here we extend this account to the more realistic case where an abstract causal model only offers an approximation of the underlying system. We show how the resulting account handles the discrepancy that can arise between low- and high-level causal models of the same system, and in the process provide an account of how one causal model approximates another, a topic of independent interest. Finally, we extend the account of approximate abstractions to probabilistic causal models, indicating how and where uncertainty can enter into an approximate abstraction.' volume: 115 URL: https://proceedings.mlr.press/v115/beckers20a.html PDF: http://proceedings.mlr.press/v115/beckers20a/beckers20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-beckers20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Sander family: Beckers - given: Frederick family: Eberhardt - given: Joseph Y. family: Halpern editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 606-615 id: beckers20a issued: date-parts: - 2020 - 8 - 6 firstpage: 606 lastpage: 615 published: 2020-08-06 00:00:00 +0000 - title: 'The Sensitivity of Counterfactual Fairness to Unmeasured Confounding' abstract: 'Causal approaches to fairness have seen substantial recent interest, both from the machine learning community and from wider parties interested in ethical prediction algorithms. In no small part, this has been due to the fact that causal models allow one to simultaneously leverage data and expert knowledge to remove discriminatory effects from predictions. However, one of the primary assumptions in causal modeling is that you know the causal graph. This introduces a new opportunity for bias, caused by misspecifying the causal model. One common way for misspecification to occur is via unmeasured confounding: the true causal effect between variables is partially described by unobserved quantities. In this work we design tools to assess the sensitivity of fairness measures to this confounding for the popular class of non-linear additive noise models (ANMs). Specifically, we give a procedure for computing the maximum difference between two counterfactually fair predictors, where one has become biased due to confounding. For the case of bivariate confounding our technique can be swiftly computed via a sequence of closed-form updates. For multivariate confounding we give an algorithm that can be efficiently solved via automatic differentiation. We demonstrate our new sensitivity analysis tools in real-world fairness scenarios to assess the bias arising from confounding.' volume: 115 URL: https://proceedings.mlr.press/v115/kilbertus20a.html PDF: http://proceedings.mlr.press/v115/kilbertus20a/kilbertus20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-kilbertus20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Niki family: Kilbertus - given: Philip J. family: Ball - given: Matt J. family: Kusner - given: Adrian family: Weller - given: Ricardo family: Silva editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 616-626 id: kilbertus20a issued: date-parts: - 2020 - 8 - 6 firstpage: 616 lastpage: 626 published: 2020-08-06 00:00:00 +0000 - title: 'Belief Propagation: Accurate Marginals or Accurate Partition Function – Where is the Difference?' abstract: 'We analyze belief propagation on patch potential models – these are attractive models with varying local potentials – obtain all of the possibly many fixed points, and gather novel insights into belief propagation’s properties. In particular, we observe and theoretically explain several regions in the parameter space that behave fundamentally different. We specify and elaborate on one specific region that, despite the existence of multiple fixed points, is relatively well behaved and provides insights into the relationship between the accuracy of the marginals and the partition function. We demonstrate the inexistence of a principle relationship between both quantities and provide sufficient conditions for a fixed point to be optimal with respect to approximating both the marginals and the partition function.' volume: 115 URL: https://proceedings.mlr.press/v115/knoll20a.html PDF: http://proceedings.mlr.press/v115/knoll20a/knoll20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-knoll20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Christian family: Knoll - given: Franz family: Pernkopf editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 627-636 id: knoll20a issued: date-parts: - 2020 - 8 - 6 firstpage: 627 lastpage: 636 published: 2020-08-06 00:00:00 +0000 - title: 'Finding Minimal d-separators in Linear Time and Applications' abstract: 'The study of graphical causal models is fundamentally the study of separations and conditional independences. We provide linear time algorithms for two graphical primitives: to test, if a given set is a minimal d-separator, and to find a minimal d-separator in directed acyclic graphs (DAGs), completed partially directed acyclic graphs (CPDAGs) and restricted chain graphs (RCGs) as well as minimal m-separators in ancestral graphs (AGs). These algorithms improve the runtime of the best previously known algorithms for minimal separators that are based on moralization and thus require quadratic time to construct and handle the moral graph. (Minimal) separating sets have important applications like finding (minimal) covariate adjustment sets or conditional instrumental variables.' volume: 115 URL: https://proceedings.mlr.press/v115/van-der-zander20a.html PDF: http://proceedings.mlr.press/v115/van-der-zander20a/van-der-zander20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-van-der-zander20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Benito prefix: van der family: Zander - given: Maciej family: Liśkiewicz editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 637-647 id: van-der-zander20a issued: date-parts: - 2020 - 8 - 6 firstpage: 637 lastpage: 647 published: 2020-08-06 00:00:00 +0000 - title: 'A Bayesian Approach to Robust Reinforcement Learning' abstract: 'Robust Markov Decision Processes (RMDPs) intend to ensure robustness with respect to changing or adversarial system behavior. In this framework, transitions are modeled as arbitrary elements of a known and properly structured uncertainty set and a robust optimal policy can be derived under the worst-case scenario. In this study, we address the issue of learning in RMDPs using a Bayesian approach. We introduce the Uncertainty Robust Bellman Equation (URBE) which encourages safe exploration for adapting the uncertainty set to new observations while preserving robustness. We propose a URBE-based algorithm, DQN-URBE, that scales this method to higher dimensional domains. Our experiments show that the derived URBE-based strategy leads to a better trade-off between less conservative solutions and robustness in the presence of model misspecification. In addition, we show that the DQN-URBE algorithm can adapt significantly faster to changing dynamics online compared to existing robust techniques with fixed uncertainty sets.' volume: 115 URL: https://proceedings.mlr.press/v115/derman20a.html PDF: http://proceedings.mlr.press/v115/derman20a/derman20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-derman20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Esther family: Derman - given: Daniel family: Mankowitz - given: Timothy family: Mann - given: Shie family: Mannor editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 648-658 id: derman20a issued: date-parts: - 2020 - 8 - 6 firstpage: 648 lastpage: 658 published: 2020-08-06 00:00:00 +0000 - title: 'Adaptivity and Optimality: A Universal Algorithm for Online Convex Optimization' abstract: 'In this paper, we study adaptive online convex optimization, and aim to design a universal algorithm that achieves optimal regret bounds for multiple common types of loss functions. Existing universal methods are limited in the sense that they are optimal for only a subclass of loss functions. To address this limitation, we propose a novel online algorithm, namely Maler, which enjoys the optimal $O(\sqrt{T})$, $O(d\log T)$ and $O(\log T)$ regret bounds for general convex, exponentially concave, and strongly convex functions respectively. The essential idea is to run multiple types of learning algorithms with different learning rates in parallel, and utilize a meta-algorithm to track the best on the fly. Empirical results demonstrate the effectiveness of our method.' volume: 115 URL: https://proceedings.mlr.press/v115/wang20e.html PDF: http://proceedings.mlr.press/v115/wang20e/wang20e.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-wang20e.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Guanghui family: Wang - given: Shiyin family: Lu - given: Lijun family: Zhang editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 659-668 id: wang20e issued: date-parts: - 2020 - 8 - 6 firstpage: 659 lastpage: 668 published: 2020-08-06 00:00:00 +0000 - title: 'Evacuate or Not? A POMDP Model of the Decision Making of Individuals in Hurricane Evacuation Zones' abstract: 'Recent hurricanes in the Atlantic region of the southern United States triggered a series of evacuation orders in the coastal cities of Florida, Georgia, and Texas. While some of these urged voluntary evacuations, most were mandatory orders. Despite governments asking people to vacate their homes for their own safety, many do not. We aim to understand the observable and hidden variables involved in the decision-making process and model these in a partially observable Markov decision process, which predicts whether a person will evacuate or not given his or her current situation. We consider the features of the particular hurricane, the dynamic situation that the individual is experiencing, and demographic factors that influence the decision making of individuals. The process model is represented as a dynamic influence diagram and evaluated on data collected via a comprehensive survey of hurricane-impacted individuals.' volume: 115 URL: https://proceedings.mlr.press/v115/sankar20a.html PDF: http://proceedings.mlr.press/v115/sankar20a/sankar20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-sankar20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Adithya Raam family: Sankar - given: Prashant family: Doshi - given: Adam family: Goodie editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 669-678 id: sankar20a issued: date-parts: - 2020 - 8 - 6 firstpage: 669 lastpage: 678 published: 2020-08-06 00:00:00 +0000 - title: 'Probabilistic Programming for Birth-Death Models of Evolution Using an Alive Particle Filter with Delayed Sampling' abstract: 'We consider probabilistic programming for birth-death models of evolution and introduce a new widely-applicable inference method that combines an extension of the alive particle filter (APF) with automatic Rao-Blackwellization via delayed sampling. Birth-death models of evolution are an important family of phylogenetic models of the diversification processes that lead to evolutionary trees. Probabilistic programming languages (PPLs) give phylogeneticists a new and exciting tool: their models can be implemented as probabilistic programs with just a basic knowledge of programming. The general inference methods in PPLs reduce the need for external experts, allow quick prototyping and testing, and accelerate the development and deployment of new models. We show how these birth-death models can be implemented as simple programs in existing PPLs, and demonstrate the usefulness of the proposed inference method for such models. For the popular BiSSE model the method yields an increase of the effective sample size and the conditional acceptance rate by a factor of 30 in comparison with a standard bootstrap particle filter. Although concentrating on phylogenetics, the extended APF is a general inference method that shows its strength in situations where particles are often assigned zero weight. In the case when the weights are always positive, the extra cost of using the APF rather than the bootstrap particle filter is negligible, making our method a suitable drop-in replacement for the bootstrap particle filter in probabilistic programming inference.' volume: 115 URL: https://proceedings.mlr.press/v115/kudlicka20a.html PDF: http://proceedings.mlr.press/v115/kudlicka20a/kudlicka20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-kudlicka20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Jan family: Kudlicka - given: Lawrence M. family: Murray - given: Fredrik family: Ronquist - given: Thomas B. family: Schön editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 679-689 id: kudlicka20a issued: date-parts: - 2020 - 8 - 6 firstpage: 679 lastpage: 689 published: 2020-08-06 00:00:00 +0000 - title: 'Variational Sparse Coding' abstract: 'Unsupervised discovery of interpretable features and controllable generation with high-dimensional data are currently major challenges in machine learning, with applications in data visualisation, clustering and artificial data synthesis. We propose a model based on variational auto-encoders (VAEs) in which interpretation is induced through latent space sparsity with a mixture of Spike and Slab distributions as prior. We derive an evidence lower bound for this model and propose a specific training method for recovering disentangled features as sparse elements in latent vectors. In our experiments, we demonstrate superior disentanglement performance to standard VAE approaches when an estimate of the number of true sources of variation is not available and objects display different combinations of attributes. Furthermore, the new model provides unique capabilities, such as recovering feature exploitation, synthesising samples that share attributes with a given input object and controlling both discrete and continuous features upon generation.' volume: 115 URL: https://proceedings.mlr.press/v115/tonolini20a.html PDF: http://proceedings.mlr.press/v115/tonolini20a/tonolini20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-tonolini20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Francesco family: Tonolini - given: Bjørn Sand family: Jensen - given: Roderick family: Murray-Smith editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 690-700 id: tonolini20a issued: date-parts: - 2020 - 8 - 6 firstpage: 690 lastpage: 700 published: 2020-08-06 00:00:00 +0000 - title: 'Learning with Non-Convex Truncated Losses by SGD' abstract: 'Learning with a convex loss function has been a dominating paradigm for many years. It remains an interesting question how non-convex loss functions help improve the generalization of learning with broad applicability. In this paper, we study a family of objective functions formed by truncating traditional loss functions, which is applicable to both shallow learning and deep learning. Truncating loss functions has potential to be less vulnerable and more robust to large noise in observations that could be adversarial. More importantly, it is a generic technique without assuming the knowledge of noise distribution. To justify non-convex learning with truncated losses, we establish excess risk bounds of empirical risk minimization based on truncated losses for heavy-tailed output, and statistical error of an approximate stationary point found by stochastic gradient descent (SGD) method. Our experiments for shallow and deep learning for regression with outliers, corrupted data and heavy-tailed noise further justify the proposed method.' volume: 115 URL: https://proceedings.mlr.press/v115/xu20b.html PDF: http://proceedings.mlr.press/v115/xu20b/xu20b.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-xu20b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Yi family: Xu - given: Shenghuo family: Zhu - given: Sen family: Yang - given: Chi family: Zhang - given: Rong family: Jin - given: Tianbao family: Yang editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 701-711 id: xu20b issued: date-parts: - 2020 - 8 - 6 firstpage: 701 lastpage: 711 published: 2020-08-06 00:00:00 +0000 - title: 'Active Multi-Information Source Bayesian Quadrature' abstract: 'Bayesian quadrature (BQ) is a sample-efficient probabilistic numerical method to solve integrals of expensive-to-evaluate black-box functions, yet so far, active BQ learning schemes focus merely on the integrand itself as information source, and do not allow for information transfer from cheaper, related functions. Here, we set the scene for active learning in BQ when multiple related information sources of variable cost (in input and source) are accessible. This setting arises for example when evaluating the integrand requires a complex simulation to be run that can be approximated by simulating at lower levels of sophistication and at lesser expense. We construct meaningful cost-sensitive multi-source acquisition-rates as an extension to common utility functions from vanilla BQ (VBQ), and discuss pitfalls that arise from blindly generalizing. In proof-of-concept experiments we scrutinize the behavior of our generalized acquisition functions. On an epidemiological model, we demonstrate that active multi-source BQ (AMS-BQ) is more cost-efficient than VBQ in learning the integral to a good accuracy.' volume: 115 URL: https://proceedings.mlr.press/v115/gessner20a.html PDF: http://proceedings.mlr.press/v115/gessner20a/gessner20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-gessner20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Alexandra family: Gessner - given: Javier family: Gonzalez - given: Maren family: Mahsereci editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 712-721 id: gessner20a issued: date-parts: - 2020 - 8 - 6 firstpage: 712 lastpage: 721 published: 2020-08-06 00:00:00 +0000 - title: 'Cascading Linear Submodular Bandits: Accounting for Position Bias and Diversity in Online Learning to Rank' abstract: 'Online learning, position bias, and diversified retrieval are three crucial aspects in designing ranking systems based on user clicks. One simple click model which explains the position bias is the cascade model. Many online learning variants of the cascade model have been proposed, but none so far captures diversity. Similarly, online learning to rank methods which capture diversity do not handle position bias. Motivated by these limitations, we propose a novel click model, and the associated online learning variant to address both position bias and diversified retrieval in a unified framework. Despite the objective function not being a standard submodular set function, we prove that a greedy-approach is a reasonable approximation. We then propose, CascadeLSB, an algorithm whose goal is to recommend K most attractive items from a large set of L items with the attractiveness of items being modeled as a submodular utility function. We analyze the algorithm and prove a gap-free upper bound on its n-step regret. We comprehensively evaluate CascadeLSB on synthetic and real datasets, where it outperforms all the baselines.' volume: 115 URL: https://proceedings.mlr.press/v115/hiranandani20a.html PDF: http://proceedings.mlr.press/v115/hiranandani20a/hiranandani20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-hiranandani20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Gaurush family: Hiranandani - given: Harvineet family: Singh - given: Prakhar family: Gupta - given: Iftikhar Ahamath family: Burhanuddin - given: Zheng family: Wen - given: Branislav family: Kveton editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 722-732 id: hiranandani20a issued: date-parts: - 2020 - 8 - 6 firstpage: 722 lastpage: 732 published: 2020-08-06 00:00:00 +0000 - title: 'Sinkhorn AutoEncoders' abstract: 'Optimal transport offers an alternative to maximum likelihood for learning generative autoencoding models. We show that minimizing the $p$-Wasserstein distance between the generator and the true data distribution is equivalent to the unconstrained min-min optimization of the $p$-Wasserstein distance between the encoder aggregated posterior and the prior in latent space, plus a reconstruction error. We also identify the role of its trade-off hyperparameter as the capacity of the generator: its Lipschitz constant. Moreover, we prove that optimizing the encoder over any class of universal approximators, such as deterministic neural networks, is enough to come arbitrarily close to the optimum. We therefore advertise this framework, which holds for any metric space and prior, as a sweet-spot of current generative autoencoding objectives.We then introduce the Sinkhorn autoencoder (SAE), which approximates and minimizes the $p$-Wasserstein distance in latent space via backprogation through the Sinkhorn algorithm. SAE directly works on samples, i.e. it models the aggregated posterior as an implicit distribution, with no need for a reparameterization trick for gradients estimations. SAE is thus able to work with different metric spaces and priors with minimal adaptations. We demonstrate the flexibility of SAE on latent spaces with different geometries and priors and compare with other methods on benchmark data sets.' volume: 115 URL: https://proceedings.mlr.press/v115/patrini20a.html PDF: http://proceedings.mlr.press/v115/patrini20a/patrini20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-patrini20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Giorgio family: Patrini - given: Rianne prefix: van den family: Berg - given: Patrick family: Forré - given: Marcello family: Carioni - given: Samarth family: Bhargav - given: Max family: Welling - given: Tim family: Genewein - given: Frank family: Nielsen editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 733-743 id: patrini20a issued: date-parts: - 2020 - 8 - 6 firstpage: 733 lastpage: 743 published: 2020-08-06 00:00:00 +0000 - title: 'How to Exploit Structure while Solving Weighted Model Integration Problems' abstract: 'Weighted model counting (WMC) is a state-of-the-art technique for probabilistic inference in discrete domains. WMC has recently been extended towards weighted model integration (WMI) in order to handle discrete and continuous distributions alike. While a number of WMI solvers have been introduced, their relationships, strengths and weaknesses are not yet well understood. WMI solving consists of two sub-problems: 1) finding convex polytopes; and 2) integrating over them efficiently. We formalize the first step as $\lambda$-SMT and discuss what strategies solvers apply to solve both the $\lambda$-SMT and the integration problem. This formalization allows us to compare state-of-the-art solvers and their behaviour across different types of WMI problems. Moreover, we identify factorizability of WMI problems as a key property that emerges in the context of probabilistic programming. Problems that can be factorized can be solved more efficiently. However, current solvers exploiting this property restrict themselves to WMI problems with univariate conditions and fully factorizable weight functions. We introduce a new algorithm, F-XSDD, that lifts these restrictions and can exploit factorizability in WMI problems with multivariate conditions and partially factorizable weight functions. Through an empirical evaluation, we show the effectiveness of our approach.' volume: 115 URL: https://proceedings.mlr.press/v115/kolb20a.html PDF: http://proceedings.mlr.press/v115/kolb20a/kolb20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-kolb20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Samuel family: Kolb - given: Pedro Zuidberg Dos family: Martires - given: Luc family: De Raedt editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 744-754 id: kolb20a issued: date-parts: - 2020 - 8 - 6 firstpage: 744 lastpage: 754 published: 2020-08-06 00:00:00 +0000 - title: 'Multi-Class Gaussian Process Classification Made Conjugate: Efficient Inference via Data Augmentation' abstract: 'We propose a new scalable multi-class Gaussian process classification approach building on a novel modified softmax likelihood function. The new likelihood has two benefits: it leads to well-calibrated uncertainty estimates and allows for an efficient latent variable augmentation. The augmented model has the advantage that it is conditionally conjugate leading to a fast variational inference method via block coordinate ascent updates. Previous approaches suffered from a trade-off between uncertainty calibration and speed. Our experiments show that our method leads to well-calibrated uncertainty estimates and competitive predictive performance while being up to two orders faster than the state of the art.' volume: 115 URL: https://proceedings.mlr.press/v115/galy-fajou20a.html PDF: http://proceedings.mlr.press/v115/galy-fajou20a/galy-fajou20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-galy-fajou20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Théo family: Galy-Fajou - given: Florian family: Wenzel - given: Christian family: Donner - given: Manfred family: Opper editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 755-765 id: galy-fajou20a issued: date-parts: - 2020 - 8 - 6 firstpage: 755 lastpage: 765 published: 2020-08-06 00:00:00 +0000 - title: 'A Flexible Framework for Multi-Objective Bayesian Optimization using Random Scalarizations' abstract: 'Many real world applications can be framed as multi-objective optimization problems, where we wish to simultaneously optimize for multiple criteria. Bayesian optimization techniques for the multi-objective setting are pertinent when the evaluation of the functions in question are expensive. Traditional methods for multi-objective optimization, both Bayesian and otherwise, are aimed at recovering the Pareto front of these objectives. However, in certain cases a practitioner might desire to identify Pareto optimal points only in a subset of the Pareto front due to external considerations. In this work, we propose a strategy based on random scalarizations of the objectives that addresses this problem. Our approach is able to flexibly sample from desired regions of the Pareto front and, computationally, is considerably cheaper than most approaches for MOO. We also study a notion of regret in the multi-objective setting and show that our strategy achieves sublinear regret. We experiment with both synthetic and real-life problems, and demonstrate superior performance of our proposed algorithm in terms of the flexibility and regret.' volume: 115 URL: https://proceedings.mlr.press/v115/paria20a.html PDF: http://proceedings.mlr.press/v115/paria20a/paria20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-paria20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Biswajit family: Paria - given: Kirthevasan family: Kandasamy - given: Barnabás family: Póczos editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 766-776 id: paria20a issued: date-parts: - 2020 - 8 - 6 firstpage: 766 lastpage: 776 published: 2020-08-06 00:00:00 +0000 - title: 'Efficient Multitask Feature and Relationship Learning' abstract: 'We consider a multitask learning problem, in which several predictors are learned jointly. Prior research has shown that learning the relations between tasks, and between the input features, together with the predictor, can lead to better generalization and interpretability, which proved to be useful for applications in many domains. In this paper, we consider a formulation of multitask learning that learns the relationships both between tasks and between features, represented through a task covariance and a feature covariance matrix, respectively. First, we demonstrate that existing methods proposed for this problem present an issue that may lead to ill-posed optimization. We then propose an alternative formulation, as well as an efficient algorithm to optimize it. Using ideas from optimization and graph theory, we propose an efficient coordinate-wise minimization algorithm that has a closed form solution for each block subproblem. Our experiments show that the proposed optimization method is orders of magnitude faster than its competitors. We also provide a nonlinear extension that is able to achieve better generalization than existing methods.' volume: 115 URL: https://proceedings.mlr.press/v115/zhao20a.html PDF: http://proceedings.mlr.press/v115/zhao20a/zhao20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-zhao20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Han family: Zhao - given: Otilia family: Stretcu - given: Alexander J. family: Smola - given: Geoffrey J. family: Gordon editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 777-787 id: zhao20a issued: date-parts: - 2020 - 8 - 6 firstpage: 777 lastpage: 787 published: 2020-08-06 00:00:00 +0000 - title: 'Practical Multi-fidelity Bayesian Optimization for Hyperparameter Tuning' abstract: 'Bayesian optimization is popular for optimizing time-consuming black-box objectives. Nonetheless, for hyperparameter tuning in deep neural networks, the time required to evaluate the validation error for even a few hyperparameter settings remains a bottleneck. Multi-fidelity optimization promises relief using cheaper proxies to such objectives — for example, validation error for a network trained using a subset of the training points or fewer iterations than required for convergence. We propose a highly flexible and practical approach to multi-fidelity Bayesian optimization, focused on efficiently optimizing hyperparameters for iteratively trained supervised learning models. We introduce a new acquisition function, the trace-aware knowledge-gradient, which efficiently leverages both multiple continuous fidelity controls and trace observations — values of the objective at a sequence of fidelities, available when varying fidelity using training iterations. We provide a provably convergent method for optimizing our acquisition function and show it outperforms state-of-the-art alternatives for hyperparameter tuning of deep neural networks and large-scale kernel learning.' volume: 115 URL: https://proceedings.mlr.press/v115/wu20a.html PDF: http://proceedings.mlr.press/v115/wu20a/wu20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-wu20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Jian family: Wu - given: Saul family: Toscano-Palmerin - given: Peter I. family: Frazier - given: Andrew Gordon family: Wilson editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 788-798 id: wu20a issued: date-parts: - 2020 - 8 - 6 firstpage: 788 lastpage: 798 published: 2020-08-06 00:00:00 +0000 - title: 'Adaptively Truncating Backpropagation Through Time to Control Gradient Bias' abstract: 'Truncated backpropagation through time (TBPTT) is a popular method for learning in recurrent neural networks (RNNs) that saves computation and memory at the cost of bias by truncating backpropagation after a fixed number of lags. In practice, choosing the optimal truncation length is difficult: TBPTT will not converge if the truncation length is too small, or will converge slowly if it is too large. We propose an adaptive TBPTT scheme that converts the problem from choosing a temporal lag to one of choosing a tolerable amount of gradient bias. For many realistic RNNs, the TBPTT gradients decay geometrically in expectation for large lags; under this condition, we can control the bias by varying the truncation length adaptively. For RNNs with smooth activation functions, we prove that this bias controls the convergence rate of SGD with biased gradients for our non-convex loss. Using this theory, we develop a practical method for adaptively estimating the truncation length during training. We evaluate our adaptive TBPTT method on synthetic data and language modeling tasks and find that our adaptive TBPTT ameliorates the computational pitfalls of fixed TBPTT.' volume: 115 URL: https://proceedings.mlr.press/v115/aicher20a.html PDF: http://proceedings.mlr.press/v115/aicher20a/aicher20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-aicher20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Christopher family: Aicher - given: Nicholas J. family: Foti - given: Emily B. family: Fox editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 799-808 id: aicher20a issued: date-parts: - 2020 - 8 - 6 firstpage: 799 lastpage: 808 published: 2020-08-06 00:00:00 +0000 - title: 'Sampling-free Uncertainty Estimation in Gated Recurrent Units with Applications to Normative Modeling in Neuroimaging' abstract: 'There has recently been a concerted effort to derive mechanisms in vision and machine learning systems to offer uncertainty estimates of the predictions they make. Clearly, there are enormous benefits to a system that is not only accurate but also has a sense for when it is not. Existing proposals center around Bayesian interpretations of modern deep architectures - these are effective but can often be computationally demanding. We show how classical ideas in the literature on exponential families on probabilistic networks provide an excellent starting point to derive uncertainty estimates in Gated Recurrent Units (GRU). Our proposal directly quantifies uncertainty deterministically, without the need for costly sampling-based estimation. We show that while uncertainty is quite useful by itself in computer vision and machine learning, we also demonstrate that it can play a key role in enabling statistical analysis with deep networks in neuroimaging studies with normative modeling methods. To our knowledge, this is the first result describing sampling-free uncertainty estimation for powerful sequential models such as GRUs.' volume: 115 URL: https://proceedings.mlr.press/v115/hwang20a.html PDF: http://proceedings.mlr.press/v115/hwang20a/hwang20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-hwang20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Seong Jae family: Hwang - given: Ronak R. family: Mehta - given: Hyunwoo J. family: Kim - given: Sterling C. family: Johnson - given: Vikas family: Singh editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 809-819 id: hwang20a issued: date-parts: - 2020 - 8 - 6 firstpage: 809 lastpage: 819 published: 2020-08-06 00:00:00 +0000 - title: 'Online Factorization and Partition of Complex Networks by Random Walk' abstract: 'Finding the reduced-dimensional structure is critical to understanding complex networks. Existing approaches such as spectral clustering are applicable only when the full network is explicitly observed. In this paper, we focus on the online factorization and partition of implicit large lumpable networks based on observations from an associated random walk. We formulate this into a nonconvex stochastic factorization problem and propose an efficient and scalable stochastic generalized Hebbian algorithm (GHA). The algorithm is able to process random walk data in a streaming fashion and learn a low-dimensional representation for each vertex. By applying a diffusion approximation analysis, we show that the continuous-time limiting process of the stochastic algorithm converges globally to the “principal components” of the Markov chain. We also establish a finite-sample error bound that matches the nonimprovable state-of-art result for online factorization. Once learned the low-dimensional state representations, we further apply clustering techniques to recover the network partition. We show that when the associated Markov process is lumpable, one can recover the partition exactly with high probability given sufficient data. We apply the proposed approach to model the traffic flow of Manhattan as city-wide random walks. By using our algorithm to analyze the taxi trip data, we discover a latent partition of the Manhattan city that closely matches the traffic dynamics.' volume: 115 URL: https://proceedings.mlr.press/v115/yang20a.html PDF: http://proceedings.mlr.press/v115/yang20a/yang20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-yang20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Lin family: Yang - given: Zheng family: Yu - given: Vladimir family: Braverman - given: Tuo family: Zhao - given: Mengdi family: Wang editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 820-830 id: yang20a issued: date-parts: - 2020 - 8 - 6 firstpage: 820 lastpage: 830 published: 2020-08-06 00:00:00 +0000 - title: 'On Densification for Minwise Hashing' abstract: 'One Permutation Hashing (OPH) is a significantly more efficient alternative to the popular minwise hashing. To produce a sketch of size $k$, OPH requires just one hash function whereas the classical minwise hashing requires $k$ hash functions. However, OPH does not have the desirable locality sensitive hashing (LSH) property that is important for indexing. [Srivastava and Li 2014, ICML] proposed the novel idea of densification to produce LSH sketches from OPH sketches, and gave the first densification routine. In this paper, we give a necessary and sufficient condition for a densification routine to result in LSH sketches when applied to OPH sketches. Furthermore, we give a novel densification routine that for every input, takes O($k \log k$) time in expectation and achieves better variance than the previous best bound obtained by [Srivastava 2017]. The running time of the densification routine given in [Srivastava 2017] for worst case inputs is O($k^2$) in expectation.' volume: 115 URL: https://proceedings.mlr.press/v115/mai20a.html PDF: http://proceedings.mlr.press/v115/mai20a/mai20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-mai20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Tung family: Mai - given: Anup family: Rao - given: Matt family: Kapilevich - given: Ryan family: Rossi - given: Yasin family: Abbasi-Yadkori - given: Ritwik family: Sinha editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 831-840 id: mai20a issued: date-parts: - 2020 - 8 - 6 firstpage: 831 lastpage: 840 published: 2020-08-06 00:00:00 +0000 - title: 'N-GCN: Multi-scale Graph Convolution for Semi-supervised Node Classification' abstract: 'Graph Convolutional Networks (GCNs) have shown significant improvements in semi-supervised learning on graph-structured data. Concurrently, unsupervised learning of graph embeddings has benefited from the information contained in random walks. In this paper, we propose a model: Network of GCNs (N-GCN), which marries these two lines of work. At its core, N-GCN trains multiple instances of GCNs over node pairs discovered at different distances in random walks, and learns a combination of the instance outputs which optimizes the classification objective. Our experiments show that our proposed N-GCN model improves state-of-the-art baselines on all of the challenging node classification tasks we consider: Cora, Citeseer, Pubmed, and PPI. In addition, our proposed method has other desirable properties, including generalization to recently proposed semi-supervised learning methods such as GraphSAGE, allowing us to propose N-SAGE, and resilience to adversarial input perturbations.' volume: 115 URL: https://proceedings.mlr.press/v115/abu-el-haija20a.html PDF: http://proceedings.mlr.press/v115/abu-el-haija20a/abu-el-haija20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-abu-el-haija20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Sami family: Abu-El-Haija - given: Amol family: Kapoor - given: Bryan family: Perozzi - given: Joonseok family: Lee editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 841-851 id: abu-el-haija20a issued: date-parts: - 2020 - 8 - 6 firstpage: 841 lastpage: 851 published: 2020-08-06 00:00:00 +0000 - title: 'Problem-dependent Regret Bounds for Online Learning with Feedback Graphs' abstract: 'This paper addresses the stochastic multi-armed bandit problem with an undirected feedback graph. We devise a UCB-based algorithm, UCB-NE, to provide a problem-dependent regret bound that depends on a clique covering. Our algorithm obtains regret which provably scales linearly with the clique covering number. Additionally, we provide problem-dependent regret bounds for a Thompson Sampling-based algorithm, TS-N, where again the bounds are linear in the clique covering number. Finally, we present experimental results to see how UCB-NE, TS-N, and a few related algorithms perform practically.' volume: 115 URL: https://proceedings.mlr.press/v115/hu20b.html PDF: http://proceedings.mlr.press/v115/hu20b/hu20b.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-hu20b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Bingshan family: Hu - given: Nishant A. family: Mehta - given: Jianping family: Pan editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 852-861 id: hu20b issued: date-parts: - 2020 - 8 - 6 firstpage: 852 lastpage: 861 published: 2020-08-06 00:00:00 +0000 - title: 'Wasserstein Fair Classification' abstract: 'We propose an approach to fair classification that enforces independence between the classifier outputs and sensitive information by minimizing Wasserstein-1 distances. The approach has desirable theoretical properties and is robust to specific choices of the threshold used to obtain class predictions from model outputs.We introduce different methods that enable hid-ing sensitive information at test time or have a simple and fast implementation. We show empirical performance against different fair-ness baselines on several benchmark fairness datasets.' volume: 115 URL: https://proceedings.mlr.press/v115/jiang20a.html PDF: http://proceedings.mlr.press/v115/jiang20a/jiang20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-jiang20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Ray family: Jiang - given: Aldo family: Pacchiano - given: Tom family: Stepleton - given: Heinrich family: Jiang - given: Silvia family: Chiappa editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 862-872 id: jiang20a issued: date-parts: - 2020 - 8 - 6 firstpage: 862 lastpage: 872 published: 2020-08-06 00:00:00 +0000 - title: 'Variational Training for Large-Scale Noisy-OR Bayesian Networks' abstract: 'We propose a stochastic variational inference algorithm for training large-scale Bayesian networks, where noisy-OR conditional distributions are used to capture higher-order relationships. One application is to the learning of hierarchical topic models for text data. While previous work has focused on two-layer networks popular in applications like medical diagnosis, we develop scalable algorithms for deep networks that capture a multi-level hierarchy of interactions. Our key innovation is a family of constrained variational bounds that only explicitly optimize posterior probabilities for the sub-graph of topics most related to the sparse observations in a given document. These constrained bounds have comparable accuracy but dramatically reduced computational cost. Using stochastic gradient updates based on our variational bounds, we learn noisy-OR Bayesian networks orders of magnitude faster than was possible with prior Monte Carlo learning algorithms, and provide a new tool for understanding large-scale binary data.' volume: 115 URL: https://proceedings.mlr.press/v115/ji20a.html PDF: http://proceedings.mlr.press/v115/ji20a/ji20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-ji20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Geng family: Ji - given: Dehua family: Cheng - given: Huazhong family: Ning - given: Changhe family: Yuan - given: Hanning family: Zhou - given: Liang family: Xiong - given: Erik B. family: Sudderth editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 873-882 id: ji20a issued: date-parts: - 2020 - 8 - 6 firstpage: 873 lastpage: 882 published: 2020-08-06 00:00:00 +0000 - title: 'Guaranteed Scalable Learning of Latent Tree Models' abstract: 'We present an integrated approach to structure and parameter estimation in latent tree graphical models, where some nodes are hidden. Our overall approach follows a “divide-and-conquer” strategy that learns models over small groups of variables and iteratively merges into a global solution. The structure learning involves combinatorial operations such as minimum spanning tree construction and local recursive grouping; the parameter learning is based on the method of moments and on tensor decompositions. Our method is guaranteed to correctly recover the unknown tree structure and the model parameters with low sample complexity for the class of linear multivariate latent tree models which includes discrete and Gaussian distributions, and Gaussian mixtures. Our bulk asynchronous parallel algorithm is implemented in parallel and scales logarithmically with the number of variables and linearly with dimensionality of each variable. ' volume: 115 URL: https://proceedings.mlr.press/v115/huang20b.html PDF: http://proceedings.mlr.press/v115/huang20b/huang20b.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-huang20b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Furong family: Huang - given: Niranjan Uma family: Naresh - given: Ioakeim family: Perros - given: Robert family: Chen - given: Jimeng family: Sun - given: Anima family: Anandkumar editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 883-893 id: huang20b issued: date-parts: - 2020 - 8 - 6 firstpage: 883 lastpage: 893 published: 2020-08-06 00:00:00 +0000 - title: 'On First-Order Bounds, Variance and Gap-Dependent Bounds for Adversarial Bandits' abstract: 'We make three contributions to the theory of k-armed adversarial bandits. First, we prove a first-order bound for a modified variant of the INF strategy by Audibert and Bubeck [2009], without sacrificing worst case optimality or modifying the loss estimators. Second, we provide a variance analysis for algorithms based on follow the regularised leader, showing that without adaptation the variance of the regret is typically {\Omega}(n^2) where n is the horizon. Finally, we study bounds that depend on the degree of separation of the arms, generalising the results by Cowan and Katehakis [2015] from the stochastic setting to the adversarial and improving the result of Seldin and Slivkins [2014] by a factor of log(n)/log(log(n)).' volume: 115 URL: https://proceedings.mlr.press/v115/pogodin20a.html PDF: http://proceedings.mlr.press/v115/pogodin20a/pogodin20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-pogodin20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Roman family: Pogodin - given: Tor family: Lattimore editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 894-904 id: pogodin20a issued: date-parts: - 2020 - 8 - 6 firstpage: 894 lastpage: 904 published: 2020-08-06 00:00:00 +0000 - title: 'Noise Contrastive Priors for Functional Uncertainty' abstract: 'Obtaining reliable uncertainty estimates of neural network predictions is a long standing challenge. Bayesian neural networks have been proposed as a solution, but it remains open how to specify their prior. In particular, the common practice of an independent normal prior in weight space imposes relatively weak constraints on the function posterior, allowing it to generalize in unforeseen ways on inputs outside of the training distribution. We propose noise contrastive priors (NCPs) to obtain reliable uncertainty estimates. The key idea is to train the model to output high uncertainty for data points outside of the training distribution. NCPs do so using an input prior, which adds noise to the inputs of the current mini batch, and an output prior, which is a wide distribution given these inputs. NCPs are compatible with any model that can output uncertainty estimates, are easy to scale, and yield reliable uncertainty estimates throughout training. Empirically, we show that NCPs prevent overfitting outside of the training distribution and result in uncertainty estimates that are useful for active learning. We demonstrate the scalability of our method on the flight delays data set, where we significantly improve upon previously published results.' volume: 115 URL: https://proceedings.mlr.press/v115/hafner20a.html PDF: http://proceedings.mlr.press/v115/hafner20a/hafner20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-hafner20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Danijar family: Hafner - given: Dustin family: Tran - given: Timothy family: Lillicrap - given: Alex family: Irpan - given: James family: Davidson editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 905-914 id: hafner20a issued: date-parts: - 2020 - 8 - 6 firstpage: 905 lastpage: 914 published: 2020-08-06 00:00:00 +0000 - title: 'Fake It Till You Make It: Learning-Compatible Performance Support' abstract: 'A longstanding goal of artificial intelligence is to develop technologies that augment or assist humans. Current approaches to developing agents that can assist humans focus on adapting behavior of the assistant, and do not consider the potential for assistants to support human learning. We argue that in many cases it is worthwhile to provide assistance in a manner that also promotes task learning or skill maintenance. We term such assistance Learning-Compatible Performance Support, and present the Stochastic Q Bumpers algorithm for greatly improving learning outcomes while still providing high levels of performance support. We demonstrate the effectiveness of our approach in multiple domains, including a complex flight control task.' volume: 115 URL: https://proceedings.mlr.press/v115/bragg20a.html PDF: http://proceedings.mlr.press/v115/bragg20a/bragg20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-bragg20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Jonathan family: Bragg - given: Emma family: Brunskill editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 915-924 id: bragg20a issued: date-parts: - 2020 - 8 - 6 firstpage: 915 lastpage: 924 published: 2020-08-06 00:00:00 +0000 - title: 'Literal or Pedagogic Human? Analyzing Human Model Misspecification in Objective Learning' abstract: 'It is incredibly easy for a system designer to misspecify the objective for an autonomous system (“robot"), thus motivating the desire to have the robot learn the objective from human behavior instead. Recent work has suggested that people have an interest in the robot performing well, and will thus behave pedagogically, choosing actions that are informative to the robot. In turn, robots benefit from interpreting the behavior by accounting for this pedagogy. In this work, we focus on misspecification: we argue that robots might not know whether people are being pedagogic or literal and that it is important to ask which assumption is safer to make. We cast objective learning into the more general form of a common-payoff game between the robot and human, and prove that in any such game literal interpretation is more robust to misspecification. Experiments with human data support our theoretical results and point to the sensitivity of the pedagogic assumption.' volume: 115 URL: https://proceedings.mlr.press/v115/milli20a.html PDF: http://proceedings.mlr.press/v115/milli20a/milli20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-milli20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Smitha family: Milli - given: Anca D. family: Dragan editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 925-934 id: milli20a issued: date-parts: - 2020 - 8 - 6 firstpage: 925 lastpage: 934 published: 2020-08-06 00:00:00 +0000 - title: 'Convergence Analysis of Gradient-Based Learning in Continuous Games' abstract: 'Considering a class of gradient-based multi-agent learning algorithms in non-cooperative settings, we provide convergence guarantees to a neighborhood of a stable Nash equilibrium. In particular, we consider continuous games where agents learn in 1) deterministic settings with oracle access to their gradient and 2) stochastic settings with an unbiased estimator of their gradient. We also study the effects of non-uniform learning rates, which causes a distortion of the vector field that can alter which equilibrium the agents converge to and the path they take. We support the analysis with numerical examples that provide insight into how one might synthesize games to achieve desired equilibria.' volume: 115 URL: https://proceedings.mlr.press/v115/chasnov20a.html PDF: http://proceedings.mlr.press/v115/chasnov20a/chasnov20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-chasnov20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Benjamin family: Chasnov - given: Lillian family: Ratliff - given: Eric family: Mazumdar - given: Samuel family: Burden editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 935-944 id: chasnov20a issued: date-parts: - 2020 - 8 - 6 firstpage: 935 lastpage: 944 published: 2020-08-06 00:00:00 +0000 - title: 'End-to-end Training of Deep Probabilistic CCA on Paired Biomedical Observations' abstract: 'Medical pathology images are visually evaluated by experts for disease diagnosis, but the connection between image features and the state of the cells in an image is typically unknown. To understand this relationship, we develop a multimodal modeling and inference framework that estimates shared latent structure of joint gene expression levels and medical image features. Our method is built around probabilistic canonical correlation analysis (PCCA), which is fit to image embeddings that are learned using convolutional neural networks and linear embeddings of paired gene expression data. Using a differentiable take on the EM algorithm, we train the model end-to-end so that the PCCA and neural network parameters are estimated simultaneously. We demonstrate the utility of this method in constructing image features that are predictive of gene expression levels on simulated data and the Genotype-Tissue Expression data. We demonstrate that the latent variables are interpretable by disentangling the latent subspace through shared and modality-specific views.' volume: 115 URL: https://proceedings.mlr.press/v115/gundersen20a.html PDF: http://proceedings.mlr.press/v115/gundersen20a/gundersen20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-gundersen20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Gregory family: Gundersen - given: Bianca family: Dumitrascu - given: Jordan T. family: Ash - given: Barbara E. family: Engelhardt editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 945-955 id: gundersen20a issued: date-parts: - 2020 - 8 - 6 firstpage: 945 lastpage: 955 published: 2020-08-06 00:00:00 +0000 - title: 'Approximate Relative Value Learning for Average-reward Continuous State MDPs' abstract: 'In this paper, we propose an approximate relative value learning (ARVL) algorithm for non- parametric MDPs with continuous state space and finite actions and average reward criterion. It is a sampling based algorithm combined with kernel density estimation and function approximation via nearest neighbors. The theoretical analysis is done via a random contraction operator framework and stochastic dominance argument. This is the first such algorithm for continuous state space MDPs with average re- ward criteria with these provable properties which does not require any discretization of state space as far as we know. We then evaluate the proposed algorithm on a benchmark problem numerically.' volume: 115 URL: https://proceedings.mlr.press/v115/sharma20a.html PDF: http://proceedings.mlr.press/v115/sharma20a/sharma20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-sharma20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Hiteshi family: Sharma - given: Mehdi family: Jafarnia-Jahromi - given: Rahul family: Jain editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 956-964 id: sharma20a issued: date-parts: - 2020 - 8 - 6 firstpage: 956 lastpage: 964 published: 2020-08-06 00:00:00 +0000 - title: 'Exact Sampling of Directed Acyclic Graphs from Modular Distributions' abstract: 'We consider the problem of sampling directed acyclic graphs (DAGs) from a given distribution. We assume the sampling distribution is modular, i.e., the probability of a DAG is a product of local factors, each of which only depends on a node and its parents in the DAG. Using inclusion–exclusion recurrence relations, we give an exact sampler that requires Õ(3^n) time for preprocessing and Õ(2^n) per sample, where n is the number of nodes and Õ suppresses polylogarithmic factors. We also consider the symmetric special case where the factors only depend on the size of the parent set—this covers uniform sampling under indegree constraints. In this case, our exact sampler requires O(n^3) time for preprocessing and O(n^2) per sample; this outperforms the previous best bound even for the uniform distribution. We demonstrate the performance of both samplers also empirically.' volume: 115 URL: https://proceedings.mlr.press/v115/talvitie20a.html PDF: http://proceedings.mlr.press/v115/talvitie20a/talvitie20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-talvitie20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Topi family: Talvitie - given: Aleksis family: Vuoksenmaa - given: Mikko family: Koivisto editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 965-974 id: talvitie20a issued: date-parts: - 2020 - 8 - 6 firstpage: 965 lastpage: 974 published: 2020-08-06 00:00:00 +0000 - title: 'Intervening on Network Ties' abstract: 'A foundational tool for making causal inferences is the emulation of randomized control trials via variable interventions. This approach has been applied to a wide variety of contexts, from health to economics [3, 6]. Variable interventions have long been studied in independent and identically distributed (iid) data contexts, but recently non-iid settings, such as networks with interacting agents [8, 17, 28] have attracted interest. In this paper, we propose a type of structural intervention [12] relevant in network contexts: the network intervention. Rather than estimating the effect of changing variables, we consider changes to social network structure resulting from creation or severance of ties between agents. We define the individual participant and average bystander effects for these interventions and describe identification criteria. We then prove a series of theoretical results that show existing identification theory obtains minimally KL-divergent distributions corresponding to network interventions. Finally, we demonstrate estimation of effects of network interventions via a simulation study.' volume: 115 URL: https://proceedings.mlr.press/v115/sherman20a.html PDF: http://proceedings.mlr.press/v115/sherman20a/sherman20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-sherman20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Eli family: Sherman - given: Ilya family: Shpitser editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 975-984 id: sherman20a issued: date-parts: - 2020 - 8 - 6 firstpage: 975 lastpage: 984 published: 2020-08-06 00:00:00 +0000 - title: 'Generating and Sampling Orbits for Lifted Probabilistic Inference' abstract: 'A key goal in the design of probabilistic inference algorithms is identifying and exploit- ing properties of the distribution that make inference tractable. Lifted inference algorithms identify symmetry as a property that enables efficient inference and seek to scale with the degree of symmetry of a probability model. A limitation of existing exact lifted inference techniques is that they do not apply to non- relational representations like factor graphs. In this work we provide the first example of an exact lifted inference algorithm for arbitrary discrete factor graphs. In addition we describe a lifted Markov-Chain Monte-Carlo algorithm that provably mixes rapidly in the degree of symmetry of the distribution.' volume: 115 URL: https://proceedings.mlr.press/v115/holtzen20a.html PDF: http://proceedings.mlr.press/v115/holtzen20a/holtzen20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-holtzen20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Steven family: Holtzen - given: Todd family: Millstein - given: Guy family: Van den Broeck editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 985-994 id: holtzen20a issued: date-parts: - 2020 - 8 - 6 firstpage: 985 lastpage: 994 published: 2020-08-06 00:00:00 +0000 - title: 'Real-Time Robotic Search using Structural Spatial Point Processes' abstract: 'Aerial robots hold great potential for aiding Search and Rescue (SAR) efforts over large areas, such as during natural disasters. Traditional approaches typically search an area exhaustively, thereby ignoring that the density of victims varies based on predictable factors, such as the terrain, population density and the type of disaster. We present a probabilistic model to automate SAR planning, with explicit minimization of the expected time to discovery. The proposed model is a spatial point process with three interacting spatial fields for i) the point patterns of persons in the area, ii) the probability of detecting persons and iii) the probability of injury. This structure allows inclusion of informative priors from e.g. geographic or cell phone traffic data, while falling back to latent Gaussian processes when priors are missing or inaccurate. To solve this problem in real-time, we propose a combination of fast approximate inference using Integrated Nested Laplace Approximation (INLA), and a novel Monte Carlo tree search tailored to the problem. Experiments using data simulated from real world Geographic Information System (GIS) maps show that the framework outperforms competing approaches, finding many more injured in the crucial first hours.' volume: 115 URL: https://proceedings.mlr.press/v115/andersson20a.html PDF: http://proceedings.mlr.press/v115/andersson20a/andersson20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-andersson20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Olov family: Andersson - given: Per family: Sidén - given: Johan family: Dahlin - given: Patrick family: Doherty - given: Mattias family: Villani editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 995-1005 id: andersson20a issued: date-parts: - 2020 - 8 - 6 firstpage: 995 lastpage: 1005 published: 2020-08-06 00:00:00 +0000 - title: 'Social Reinforcement Learning to Combat Fake News Spread' abstract: 'In this work, we develop a social reinforcement learning approach to combat the spread of fake news. Specifically, we aim to learn an intervention model to promote the spread of true news in a social network—in order to mitigate the impact of fake news. We model news diffusion as a Multivariate Hawkes Process (MHP) and make interventions that are learnt via policy optimization. The key insight is to estimate the response a user will get from the social network upon sharing a post, as it indicates her impact on diffusion, and will thus help in efficient allocation of incentive. User responses also depend on political bias and peer influence, which we model as a second MHP, interleaving it with the news diffusion process. We evaluate our model on semi-synthetic and real-world data. The results demonstrate that our proposed model outperforms other alternatives that do not consider estimates of user responses and political bias when learning how to allocate incentives.' volume: 115 URL: https://proceedings.mlr.press/v115/goindani20a.html PDF: http://proceedings.mlr.press/v115/goindani20a/goindani20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-goindani20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Mahak family: Goindani - given: Jennifer family: Neville editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 1006-1016 id: goindani20a issued: date-parts: - 2020 - 8 - 6 firstpage: 1006 lastpage: 1016 published: 2020-08-06 00:00:00 +0000 - title: 'P3O: Policy-on Policy-off Policy Optimization' abstract: 'On-policy reinforcement learning (RL) algorithms have high sample complexity while off-policy algorithms are difficult to tune. Merging the two holds the promise to develop efficient algorithms that generalize across diverse environments. It is however challenging in practice to find suitable hyper-parameters that govern this trade off. This paper develops a simple algorithm named P3O that interleaves off-policy updates with on-policy updates. P3O uses the effective sample size between the behavior policy and the target policy to control how far they can be from each other and does not introduce any additional hyper-parameters. Extensive experiments on the Atari-2600 and MuJoCo benchmark suites show that this simple technique is effective in reducing the sample complexity of state-of-the-art algorithms. Code to reproduce experiments in this paper is at https://github.com/rasoolfa/P3O.' volume: 115 URL: https://proceedings.mlr.press/v115/fakoor20a.html PDF: http://proceedings.mlr.press/v115/fakoor20a/fakoor20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-fakoor20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Rasool family: Fakoor - given: Pratik family: Chaudhari - given: Alexander J. family: Smola editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 1017-1027 id: fakoor20a issued: date-parts: - 2020 - 8 - 6 firstpage: 1017 lastpage: 1027 published: 2020-08-06 00:00:00 +0000 - title: 'Causal Inference Under Interference And Network Uncertainty' abstract: 'Classical causal and statistical inference methods typically assume the observed data consists of independent realizations. However, in many applications this assumption is inappropriate due to a network of dependences between units in the data. Methods for estimating causal effects have been developed in the setting where the structure of dependence between units is known exactly, but in practice there is often substantial uncertainty about the precise network structure. This is true, for example, in trial data drawn from vulnerable communities where social ties are difficult to query directly. In this paper we combine techniques from the structure learning and interference literatures in causal inference, proposing a general method for estimating causal effects under data dependence when the structure of this dependence is not known a priori. We demonstrate the utility of our method on synthetic datasets which exhibit network dependence.' volume: 115 URL: https://proceedings.mlr.press/v115/bhattacharya20a.html PDF: http://proceedings.mlr.press/v115/bhattacharya20a/bhattacharya20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-bhattacharya20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Rohit family: Bhattacharya - given: Daniel family: Malinsky - given: Ilya family: Shpitser editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 1028-1038 id: bhattacharya20a issued: date-parts: - 2020 - 8 - 6 firstpage: 1028 lastpage: 1038 published: 2020-08-06 00:00:00 +0000 - title: 'Revisiting Reweighted Wake-Sleep for Models with Stochastic Control Flow' abstract: 'Stochastic control-flow models (SCFMs) are a class of generative models that involve branching on choices from discrete random variables. Amortized gradient-based learning of SCFMs is challenging as most approaches targeting discrete variables rely on their continuous relaxations—which can be intractable in SCFMs, as branching on relaxations requires evaluating all (exponentially many) branching paths. Tractable alternatives mainly combine REINFORCE with complex control-variate schemes to improve the variance of naive estimators. Here, we revisit the reweighted wake-sleep (RWS) [5] algorithm, and through extensive evaluations, show that it outperforms current state-of-the-art methods in learning SCFMs. Further, in contrast to the importance weighted autoencoder, we observe that RWS learns better models and inference networks with increasing numbers of particles. Our results suggest that RWS is a competitive, often preferable, alternative for learning SCFMs.' volume: 115 URL: https://proceedings.mlr.press/v115/le20a.html PDF: http://proceedings.mlr.press/v115/le20a/le20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-le20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Tuan Anh family: Le - given: Adam R. family: Kosiorek - given: N. family: Siddharth - given: Yee Whye family: Teh - given: Frank family: Wood editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 1039-1049 id: le20a issued: date-parts: - 2020 - 8 - 6 firstpage: 1039 lastpage: 1049 published: 2020-08-06 00:00:00 +0000 - title: 'Learnability for the Information Bottleneck' abstract: 'The Information Bottleneck (IB) method (Tishby et al. (2000)) provides an insightful and principled approach for balancing compression and prediction for representation learning. The IB objective I (X ; Z ) − βI(Y ; Z) employs a Lagrange multiplier β to tune this trade-off. However, in practice, not only is β chosen empirically without theoretical guidance, there is also a lack of theoretical understanding between β, learnability, the intrinsic nature of the dataset and model capacity. In this paper, we show that if β is improperly chosen, learning cannot happen – the trivial representation P(Z|X) = P(Z) becomes the global minimum of the IB objective. We show how this can be avoided, by identifying a sharp phase transition between the unlearnable and the learnable which arises as β is varied. This phase transition defines the concept of IB-Learnability. We prove several sufficient conditions for IB-Learnability, which provides theoretical guidance for choosing a good β. We further show that IB-learnability is determined by the largest confident, typical, and imbalanced subset of the examples (the conspicuous subset), and discuss its relation with model capacity. We give practical algorithms to estimate the minimum β for a given dataset. We also empirically demonstrate our theoretical conditions with analyses of synthetic datasets, MNIST, and CIFAR10.' volume: 115 URL: https://proceedings.mlr.press/v115/wu20b.html PDF: http://proceedings.mlr.press/v115/wu20b/wu20b.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-wu20b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Tailin family: Wu - given: Ian family: Fischer - given: Isaac L. family: Chuang - given: Max family: Tegmark editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 1050-1060 id: wu20b issued: date-parts: - 2020 - 8 - 6 firstpage: 1050 lastpage: 1060 published: 2020-08-06 00:00:00 +0000 - title: 'Learning Belief Representations for Imitation Learning in POMDPs' abstract: 'We consider the problem of imitation learning from expert demonstrations in partially observable Markov decision processes (POMDPs). Belief representations, which characterize the distribution over the latent states in a POMDP, have been modeled using recurrent neural networks and probabilistic latent variable models, and shown to be effective for reinforcement learning in POMDPs. In this work, we investigate the belief representation learning problem for generative adversarial imitation learning in POMDPs. Instead of training the belief module and the policy separately as suggested in prior work, we learn the belief module jointly with the policy, using a task-aware imitation loss to ensure that the representation is more aligned with the policy’s objective. To improve robustness of representation, we introduce several informative belief regularization techniques, including multi-step prediction of dynamics and action-sequences. Evaluated on various partially observable continuous-control locomotion tasks, our belief-module imitation learning approach (BMIL) substantially outperforms several baselines, including the original GAIL algorithm and the task-agnostic belief learning algorithm. Extensive ablation analysis indicates the effectiveness of task-aware belief learning and belief regularization. Code for the project is available online.' volume: 115 URL: https://proceedings.mlr.press/v115/gangwani20a.html PDF: http://proceedings.mlr.press/v115/gangwani20a/gangwani20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-gangwani20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Tanmay family: Gangwani - given: Joel family: Lehman - given: Qiang family: Liu - given: Jian family: Peng editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 1061-1071 id: gangwani20a issued: date-parts: - 2020 - 8 - 6 firstpage: 1061 lastpage: 1071 published: 2020-08-06 00:00:00 +0000 - title: 'Object Conditioning for Causal Inference' abstract: 'We describe and analyze a form of conditioning that is widely applied within social science and applied statistics but that is virtually unknown within causal graphical models. This approach, which we term object conditioning, can adjust for the effects of latent confounders and yet avoid the pitfall of conditioning on colliders. We describe object conditioning using plate models and show how its probabilistic implications can be explained using the property of exchangeability. We show that several seemingly obvious interpretations of object conditioning are insufficient to describe its probabilistic implications. Finally, we use object conditioning to describe and unify key aspects of a diverse set of techniques for causal inference, including within-subjects designs, difference-in-differences designs, and interrupted time-series designs.' volume: 115 URL: https://proceedings.mlr.press/v115/jensen20a.html PDF: http://proceedings.mlr.press/v115/jensen20a/jensen20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-jensen20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: David family: Jensen - given: Javier family: Burroni - given: Matthew family: Rattigan editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 1072-1082 id: jensen20a issued: date-parts: - 2020 - 8 - 6 firstpage: 1072 lastpage: 1082 published: 2020-08-06 00:00:00 +0000 - title: 'CCMI : Classifier based Conditional Mutual Information Estimation' abstract: 'Conditional Mutual Information (CMI) is a measure of conditional dependence between random variables X and Y, given another random variable Z. It can be used to quantify conditional dependence among variables in many data-driven inference problems such as graphical models, causal learning, feature selection and time-series analysis. While k-nearest neighbor (kNN) based estimators as well as kernel-based methods have been widely used for CMI estimation, they suffer severely from the curse of dimensionality. In this paper, we leverage advances in classifiers and generative models to design methods for CMI estimation. Specifically, we introduce an estimator for KL-Divergence based on the likelihood ratio by training a classifier to distinguish the observed joint distribution from the product distribution. We then show how to construct several CMI estimators using this basic divergence estimator by drawing ideas from conditional generative models. We demonstrate that the estimates from our proposed approaches do not degrade in performance with increasing dimension and obtain significant improvement over the widely used KSG estimator. Finally, as an application of accurate CMI estimation, we use our best estimator for conditional independence testing and achieve superior performance than the state-of-the-art tester on both simulated and real data-sets.' volume: 115 URL: https://proceedings.mlr.press/v115/mukherjee20a.html PDF: http://proceedings.mlr.press/v115/mukherjee20a/mukherjee20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-mukherjee20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Sudipto family: Mukherjee - given: Himanshu family: Asnani - given: Sreeram family: Kannan editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 1083-1093 id: mukherjee20a issued: date-parts: - 2020 - 8 - 6 firstpage: 1083 lastpage: 1093 published: 2020-08-06 00:00:00 +0000 - title: 'Empirical Mechanism Design: Designing Mechanisms from Data' abstract: 'We introduce a methodology for the design of parametric mechanisms, which are multiagent systems inhabited by strategic agents, with knobs that can be adjusted to achieve specific goals. We assume agents play approximate equilibria, which we estimate using the probably approximately correct learning framework. Under this assumption, we further learn approximately optimal mechanism parameters. We do this theoretically, assuming a finite design space, and heuristically, using Bayesian optimization (BO). Our BO algorithm incorporates the noise associated with modern concentration inequalities, such as Hoeffding’s, into the underlying Gaussian process. We show experimentally that our search techniques outperform standard baselines in a stylized but rich model of advertisement exchanges.' volume: 115 URL: https://proceedings.mlr.press/v115/viqueira20a.html PDF: http://proceedings.mlr.press/v115/viqueira20a/viqueira20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-viqueira20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Enrique Areyan family: Viqueira - given: Cyrus family: Cousins - given: Yasser family: Mohammad - given: Amy family: Greenwald editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 1094-1104 id: viqueira20a issued: date-parts: - 2020 - 8 - 6 firstpage: 1094 lastpage: 1104 published: 2020-08-06 00:00:00 +0000 - title: 'On the Relationship Between Satisfiability and Markov Decision Processes' abstract: 'Stochastic satisfiability (SSAT) and decision-theoretic planning in finite horizon partially observable Markov decision processes (POMDPs) are both PSPACE-Complete. We describe constructive reductions between SSAT and flat POMDPs that open the door to comparisons and future cross-fertilization between the solution techniques of those problems. We also propose a new SSAT solver called Prime that incorporates recent advances from the SAT and #SAT literature. Using our reduction from POMDP to SSAT, we demonstrate the competitiveness of Prime on finite horizon POMDP problems.' volume: 115 URL: https://proceedings.mlr.press/v115/salmon20a.html PDF: http://proceedings.mlr.press/v115/salmon20a/salmon20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-salmon20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Ricardo family: Salmon - given: Pascal family: Poupart editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 1105-1115 id: salmon20a issued: date-parts: - 2020 - 8 - 6 firstpage: 1105 lastpage: 1115 published: 2020-08-06 00:00:00 +0000 - title: 'Interpretable Almost Matching Exactly With Instrumental Variables' abstract: 'Uncertainty in the estimation of the causal effect in observational studies is often due to unmeasured confounding, i.e., the presence of unobserved covariates linking treatments and outcomes. Instrumental Variables (IV) are commonly used to reduce the effects of unmeasured confounding. Existing methods for IV estimation either require strong parametric assumptions, use arbitrary distance metrics, or do not scale well to large datasets. We propose a matching framework for IV in the presence of observed categorical confounders that addresses these weaknesses. Our method first matches units exactly, and then consecutively drops variables to approximately match the remaining units on as many variables as possible. We show that our algorithm constructs better matches than other existing methods on simulated datasets, and we produce interesting results in an application to political canvassing.' volume: 115 URL: https://proceedings.mlr.press/v115/awan20a.html PDF: http://proceedings.mlr.press/v115/awan20a/awan20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-awan20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: M. Usaid family: Awan - given: Yameng family: Liu - given: Marco family: Morucci - given: Sudeepa family: Roy - given: Cynthia family: Rudin - given: Alexander family: Volfovsky editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 1116-1126 id: awan20a issued: date-parts: - 2020 - 8 - 6 firstpage: 1116 lastpage: 1126 published: 2020-08-06 00:00:00 +0000 - title: 'Low Frequency Adversarial Perturbation' abstract: 'Adversarial images aim to change a target model’s decision by minimally perturbing a target image. In the black-box setting, the absence of gradient information often renders this search problem costly in terms of query complexity. In this paper we propose to restrict the search for adversarial images to a low frequency domain. This approach is readily compatible with many existing black-box attack frameworks and consistently reduces their query cost by 2 to 4 times. Further, we can circumvent image transformation defenses even when both the model and the defense strategy are unknown. Finally, we demonstrate the efficacy of this technique by fooling the Google Cloud Vision platform with an unprecedented low number of model queries.' volume: 115 URL: https://proceedings.mlr.press/v115/guo20a.html PDF: http://proceedings.mlr.press/v115/guo20a/guo20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-guo20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Chuan family: Guo - given: Jared S. family: Frank - given: Kilian Q. family: Weinberger editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 1127-1137 id: guo20a issued: date-parts: - 2020 - 8 - 6 firstpage: 1127 lastpage: 1137 published: 2020-08-06 00:00:00 +0000 - title: 'Markov Logic Networks for Knowledge Base Completion: A Theoretical Analysis Under the MCAR Assumption' abstract: 'We study the following question. We are given a knowledge base in which some facts are missing. We learn the weights of a Markov logic network using maximum likelihood estimation on this knowledge base and then use the learned Markov logic network to predict the missing facts. Assuming that the facts are missing independently and with the same probability, can we say that this approach is consistent in some precise sense? This is a non-trivial question because we are learning from only one training example. In this paper we show that the answer to this question is positive.' volume: 115 URL: https://proceedings.mlr.press/v115/kuzelka20a.html PDF: http://proceedings.mlr.press/v115/kuzelka20a/kuzelka20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-kuzelka20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Ondřej family: Kuželka - given: Jesse family: Davis editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 1138-1148 id: kuzelka20a issued: date-parts: - 2020 - 8 - 6 firstpage: 1138 lastpage: 1148 published: 2020-08-06 00:00:00 +0000 - title: 'Identification In Missing Data Models Represented By Directed Acyclic Graphs' abstract: 'Missing data is a pervasive problem in data analyses, resulting in datasets that contain censored realizations of a target distribution. Many approaches to inference on the target distribution using censored observed data, rely on missing data models represented as a factorization with respect to a directed acyclic graph. In this paper we consider the identifiability of the target distribution within this class of models, and show that the most general identification strategies proposed so far retain a significant gap in that they fail to identify a wide class of identifiable distributions. To address this gap, we propose a new algorithm that significantly generalizes the types of manipulations used in the ID algorithm [14, 16], developed in the context of causal inference, in order to obtain identification.' volume: 115 URL: https://proceedings.mlr.press/v115/bhattacharya20b.html PDF: http://proceedings.mlr.press/v115/bhattacharya20b/bhattacharya20b.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-bhattacharya20b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Rohit family: Bhattacharya - given: Razieh family: Nabi - given: Ilya family: Shpitser - given: James M. family: Robins editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 1149-1158 id: bhattacharya20b issued: date-parts: - 2020 - 8 - 6 firstpage: 1149 lastpage: 1158 published: 2020-08-06 00:00:00 +0000 - title: 'A Weighted Mini-Bucket Bound for Solving Influence Diagram' abstract: 'Influence diagrams provide a modeling and inference framework for sequential decision problems, representing the probabilistic knowledge by a Bayesian network and the preferences of an agent by utility functions over the random variables and decision variables. The time and space complexity of computing the maximum expected utility (MEU) and its maximizing policy are exponential in the induced width of the underlying graphical model, which is often prohibitively large. In this paper, we develop a weighted mini-bucket approach for bounding the MEU. These bounds can be used as a stand-alone approximation that can be improved as a function of a controlling i-bound parameter, or as a heuristic function to guide subsequent search. We evaluate the scheme empirically against the current state of the art, illustrating its potential.' volume: 115 URL: https://proceedings.mlr.press/v115/lee20c.html PDF: http://proceedings.mlr.press/v115/lee20c/lee20c.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-lee20c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Junkyu family: Lee - given: Radu family: Marinescu - given: Alexander family: Ihler - given: Rina family: Dechter editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 1159-1168 id: lee20c issued: date-parts: - 2020 - 8 - 6 firstpage: 1159 lastpage: 1168 published: 2020-08-06 00:00:00 +0000 - title: 'Subspace Inference for Bayesian Deep Learning' abstract: 'Bayesian inference was once a gold standard for learning with neural networks, providing accurate full predictive distributions and well calibrated uncertainty. However, scaling Bayesian inference techniques to deep neural networks is challenging due to the high dimensionality of the parameter space. In this paper, we construct low-dimensional subspaces of parameter space, such as the first principal components of the stochastic gradient descent (SGD) trajectory, which contain diverse sets of high performing models. In these subspaces, we are able to apply elliptical slice sampling and variational inference, which struggle in the full parameter space. We show that Bayesian model averaging over the induced posterior in these subspaces produces accurate predictions and well-calibrated predictive uncertainty for both regression and image classification.' volume: 115 URL: https://proceedings.mlr.press/v115/izmailov20a.html PDF: http://proceedings.mlr.press/v115/izmailov20a/izmailov20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-izmailov20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Pavel family: Izmailov - given: Wesley J. family: Maddox - given: Polina family: Kirichenko - given: Timur family: Garipov - given: Dmitry family: Vetrov - given: Andrew Gordon family: Wilson editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 1169-1179 id: izmailov20a issued: date-parts: - 2020 - 8 - 6 firstpage: 1169 lastpage: 1179 published: 2020-08-06 00:00:00 +0000 - title: 'Off-Policy Policy Gradient with Stationary Distribution Correction' abstract: 'We study the problem of off-policy policy optimization in Markov decision processes, and develop a novel off-policy policy gradient method. Prior off-policy policy gradient approaches have generally ignored the mismatch between the distribution of states visited under the behavior policy used to collect data, and what would be the distribution of states under the learned policy. Here we build on recent progress for estimating the ratio of the state distributions under behavior and evaluation policies for policy evaluation, and present an off-policy policy gradient optimization technique that can account for this mismatch in distributions. We present an illustrative example of why this is important and a theoretical convergence guarantee for our approach. Empirically, we compare our method in simulations to several strong baselines which do not correct for this mismatch, significantly improving in the quality of the policy discovered.' volume: 115 URL: https://proceedings.mlr.press/v115/liu20a.html PDF: http://proceedings.mlr.press/v115/liu20a/liu20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-liu20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Yao family: Liu - given: Adith family: Swaminathan - given: Alekh family: Agarwal - given: Emma family: Brunskill editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 1180-1190 id: liu20a issued: date-parts: - 2020 - 8 - 6 firstpage: 1180 lastpage: 1190 published: 2020-08-06 00:00:00 +0000 - title: 'Co-training for Policy Learning' abstract: 'We study the problem of learning sequential decision-making policies in settings with multiple state-action representations. Such settings naturally arise in many domains, such as planning (e.g., multiple integer programming formulations) and various combinatorial optimization problems (e.g., those with both integer programming and graph-based formulations). Inspired by the classical co-training framework for classification, we study the problem of co-training for policy learning. We present sufficient conditions under which learning from two views can improve upon learning from a single view alone. Motivated by these theoretical insights, we present a meta-algorithm for co-training for sequential decision making. Our framework is compatible with both reinforcement learning and imitation learning. We validate the effectiveness of our approach across a wide range of tasks, including discrete/continuous control and combinatorial optimization.' volume: 115 URL: https://proceedings.mlr.press/v115/song20b.html PDF: http://proceedings.mlr.press/v115/song20b/song20b.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-song20b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Jialin family: Song - given: Ravi family: Lanka - given: Yisong family: Yue - given: Masahiro family: Ono editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 1191-1201 id: song20b issued: date-parts: - 2020 - 8 - 6 firstpage: 1191 lastpage: 1201 published: 2020-08-06 00:00:00 +0000 - title: 'Variational Inference of Penalized Regression with Submodular Functions' abstract: 'Various regularizers inducing structured-sparsity are constructed as Lovász extensions of submodular functions. In this paper, we consider a hierarchical probabilistic model of linear regression and its kernel extension with this type of regularization, and develop a variational inference scheme for the posterior estimate on this model. We derive an upper bound on the partition function with an approximation guarantee, and then show that minimizing this bound is equivalent to the minimization of a quadratic function over the polyhedron determined by the corresponding submodular function, which can be solved efficiently by the proximal gradient algorithm. Our scheme gives a natural extension of the Bayesian Lasso model for the maximum a posteriori (MAP) estimation to a variety of regularizers inducing structured sparsity, and thus this work provides a principled way to transfer the advantages of the Bayesian formulation into those models. Finally, we investigate the empirical performance of our scheme with several Bayesian variants of widely known models such as Lasso, generalized fused Lasso, and non-overlapping group Lasso.' volume: 115 URL: https://proceedings.mlr.press/v115/takeuchi20a.html PDF: http://proceedings.mlr.press/v115/takeuchi20a/takeuchi20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-takeuchi20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Koh family: Takeuchi - given: Yuichi family: Yoshida - given: Yoshinobu family: Kawahara editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 1202-1211 id: takeuchi20a issued: date-parts: - 2020 - 8 - 6 firstpage: 1202 lastpage: 1211 published: 2020-08-06 00:00:00 +0000 - title: 'Probability Distillation: A Caveat and Alternatives' abstract: 'Due to Van den Oord et al. (2018), probability distillation has recently been of interest to deep learning practitioners, where, as a practical workaround for deploying autoregressive models in real-time applications, a student net-work is used to obtain quality samples in parallel. We identify a pathological optimization issue with the adopted stochastic minimization of the reverse-KL divergence: the curse of dimensionality results in a skewed gradient distribution that renders training inefficient. This means that KL-based "evaluative" training can be susceptible to poor exploration if the target distribution is highly structured. We then explore alternative principles for distillation, including one with an "instructive" signal, and show that it is possible to achieve qualitatively better results than with KL minimization.' volume: 115 URL: https://proceedings.mlr.press/v115/huang20c.html PDF: http://proceedings.mlr.press/v115/huang20c/huang20c.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-huang20c.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Chin-Wei family: Huang - given: Faruk family: Ahmed - given: Kundan family: Kumar - given: Alexandre family: Lacoste - given: Aaron family: Courville editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 1212-1221 id: huang20c issued: date-parts: - 2020 - 8 - 6 firstpage: 1212 lastpage: 1221 published: 2020-08-06 00:00:00 +0000 - title: 'Bayesian Optimization with Binary Auxiliary Information' abstract: 'This paper presents novel mixed-type Bayesian optimization (BO) algorithms to accelerate the optimization of a target objective function by exploiting correlated auxiliary information of binary type that can be more cheaply obtained, such as in policy search for reinforcement learning and hyperparameter tuning of machine learning models with early stopping. To achieve this, we first propose a mixed-type multi-output Gaussian process (MOGP) to jointly model the continuous target function and binary auxiliary functions. Then, we propose information-based acquisition functions such as mixed-type entropy search (MT-ES) and mixed-type predictive ES (MT-PES) for mixed-type BO based on the MOGP predictive belief of the target and auxiliary functions. The exact acquisition functions of MT-ES and MT-PES cannot be computed in closed form and need to be approximated. We derive an efficient approximation of MT-PES via a novel mixed-type random features approximation of the MOGP model whose cross-correlation structure between the target and auxiliary functions can be exploited for improving the belief of the global target maximizer using the observations from evaluating these functions. We also propose new practical constraints to relate the global target maximizer to the binary auxiliary functions. We empirically evaluate the performance of MT-ES and MT-PES with synthetic and real-world experiments. ' volume: 115 URL: https://proceedings.mlr.press/v115/zhang20b.html PDF: http://proceedings.mlr.press/v115/zhang20b/zhang20b.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-zhang20b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Yehong family: Zhang - given: Zhongxiang family: Dai - given: Bryan Kian Hsiang family: Low editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 1222-1232 id: zhang20b issued: date-parts: - 2020 - 8 - 6 firstpage: 1222 lastpage: 1232 published: 2020-08-06 00:00:00 +0000 - title: 'On Open-Universe Causal Reasoning' abstract: 'We extend two kinds of causal models, structural equation models and simulation models, to infinite variable spaces. This enables a semantics of counterfactuals, calculus of intervention, and axiomatization of causal reasoning for rich, expressive generative models—including those in which a causal representation exists only implicitly—in an open-universe setting. Further, we show that under suitable restrictions the two kinds of models are equivalent, perhaps surprisingly since their conditional logics differ substantially in the general case. We give a series of complete axiomatizations in which the open-universe nature of the setting is seen to be essential.' volume: 115 URL: https://proceedings.mlr.press/v115/ibeling20a.html PDF: http://proceedings.mlr.press/v115/ibeling20a/ibeling20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-ibeling20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Duligur family: Ibeling - given: Thomas family: Icard editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 1233-1243 id: ibeling20a issued: date-parts: - 2020 - 8 - 6 firstpage: 1233 lastpage: 1243 published: 2020-08-06 00:00:00 +0000 - title: 'Embarrassingly Parallel MCMC using Deep Invertible Transformations' abstract: 'While MCMC methods have become a main work-horse for Bayesian inference, scaling them to large distributed datasets is still a challenge. Embarrassingly parallel MCMC strategies take a divide-and-conquer stance to achieve this by writing the target posterior as a product of subposteriors, running MCMC for each of them in parallel and subsequently combining the results. The challenge then lies in devising efficient aggregation strategies. Current strategies trade-off between approximation quality, and costs of communication and computation. In this work, we introduce a novel method that addresses these issues simultaneously. Our key insight is to introduce a deep invertible transformation to approximate each of the subposteriors. These approximations can be made accurate even for complex distributions and serve as intermediate representations, keeping the total communication cost limited. Moreover, they enable us to sample from the product of the subposteriors using an efficient and stable importance sampling scheme. We demonstrate that the approach outperforms available state-of-the-art methods in a range of challenging scenarios, including high-dimensional and heterogeneous subposteriors.' volume: 115 URL: https://proceedings.mlr.press/v115/mesquita20a.html PDF: http://proceedings.mlr.press/v115/mesquita20a/mesquita20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-mesquita20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Diego family: Mesquita - given: Paul family: Blomstedt - given: Samuel family: Kaski editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 1244-1252 id: mesquita20a issued: date-parts: - 2020 - 8 - 6 firstpage: 1244 lastpage: 1252 published: 2020-08-06 00:00:00 +0000 - title: 'Fast Proximal Gradient Descent for A Class of Non-convex and Non-smooth Sparse Learning Problems' abstract: 'Non-convex and non-smooth optimization problems are important for statistics and machine learning. However, solving such problems is always challenging. In this paper, we propose fast proximal gradient descent based methods to solve a class of non-convex and non-smooth sparse learning problems, i.e. the $\ell^0$ regularization problems. We prove improved convergence rate of proximal gradient descent on the $\ell^0$ regularization problems, and propose two accelerated versions by support projection. The proposed accelerated proximal gradient descent methods by support projection have convergence rates which match the Nesterov’s optimal convergence rate of first-order methods on smooth and convex objective function with Lipschitz continuous gradient. Experimental results demonstrate the effectiveness of the proposed algorithms. We also propose feedforward neural networks as fast encoders to approximate the optimization results generated by the proposed accelerated algorithms.' volume: 115 URL: https://proceedings.mlr.press/v115/yang20b.html PDF: http://proceedings.mlr.press/v115/yang20b/yang20b.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-yang20b.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Yingzhen family: Yang - given: Jiahui family: Yu editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 1253-1262 id: yang20b issued: date-parts: - 2020 - 8 - 6 firstpage: 1253 lastpage: 1262 published: 2020-08-06 00:00:00 +0000 - title: 'Block Neural Autoregressive Flow' abstract: 'Normalising flows (NFs) map two density functions via a differentiable bijection whose Jacobian determinant can be computed efficiently. Recently, as an alternative to hand-crafted bijections, Huang et al. (2018) proposed neural autoregressive flow (NAF) which is a universal approximator for density functions. Their flow is a neural network (NN) whose parameters are predicted by another NN. The latter grows quadratically with the size of the former and thus an efficient technique for parametrization is needed. We propose block neural autoregressive flow (B-NAF), a much more compact universal approximator of density functions, where we model a bijection directly using a single feed-forward network. Invertibility is ensured by carefully designing each affine transformation with block matrices that make the flow autoregressive and (strictly) monotone. We compare B-NAF to NAF and other established flows on density estimation and approximate inference for latent variable models. Our proposed flow is competitive across datasets while using orders of magnitude fewer parameters.' volume: 115 URL: https://proceedings.mlr.press/v115/de-cao20a.html PDF: http://proceedings.mlr.press/v115/de-cao20a/de-cao20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-de-cao20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Nicola family: De Cao - given: Wilker family: Aziz - given: Ivan family: Titov editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 1263-1273 id: de-cao20a issued: date-parts: - 2020 - 8 - 6 firstpage: 1263 lastpage: 1273 published: 2020-08-06 00:00:00 +0000 - title: 'Exclusivity Graph Approach to Instrumental Inequalities' abstract: 'Instrumental variables allow the estimation of cause and effect relations even in presence of unobserved latent factors, thus providing a powerful tool for any science wherein causal inference plays an important role. More recently, the instrumental scenario has also attracted increasing attention in quantum physics, since it is related to the seminal Bell’s theorem and in fact allows the detection of even stronger quantum effects, thus enhancing our current capabilities to process information and becoming a valuable tool in quantum cryptography. In this work, we further explore this bridge between causality and quantum theory and apply a technique, originally developed in the field of quantum foundations, to express the constraints implied by causal relations in the language of graph theory. This new approach can be applied to any causal model containing a latent variable. Here, by focusing on the instrumental scenario, it allows us to easily reproduce known results as well as obtain new ones and gain new insights on the connections and differences between the instrumental and the Bell scenarios. ' volume: 115 URL: https://proceedings.mlr.press/v115/poderini20a.html PDF: http://proceedings.mlr.press/v115/poderini20a/poderini20a.pdf edit: https://github.com/mlresearch//v115/edit/gh-pages/_posts/2020-08-06-poderini20a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of The 35th Uncertainty in Artificial Intelligence Conference' publisher: 'PMLR' author: - given: Davide family: Poderini - given: Rafael family: Chaves - given: Iris family: Agresti - given: Gonzalo family: Carvacho - given: Fabio family: Sciarrino editor: - given: Ryan P. family: Adams - given: Vibhav family: Gogate page: 1274-1283 id: poderini20a issued: date-parts: - 2020 - 8 - 6 firstpage: 1274 lastpage: 1283 published: 2020-08-06 00:00:00 +0000