Proceedings of Machine Learning Research

Proceedings of Machine Learning Research Proceedings of The 11th International Conference on Probabilistic Graphical Models Held in Almer\'{\i}a, Spain on 05-07 October 2022 Published as Volume 186 by the Proceedings of Machine Learning Research on 19 September 2022. Volume Edited by: Antonio Salmerón Rafael Rumı́ Series Editors: Neil D. Lawrence https://proceedings.mlr.press/v186/ Thu, 09 Feb 2023 06:19:28 +0000 Thu, 09 Feb 2023 06:19:28 +0000 Jekyll v3.9.3 Bounding Counterfactuals under Selection Bias Causal analysis may be affected by selection bias, which is defined as the systematic exclusion of data from a certain subpopulation. Previous work in this area focused on the derivation of identifiability conditions. We propose instead a first algorithm to address both identifiable and unidentifiable queries. We prove that, in spite of the missingness induced by the selection bias, the likelihood of the available data is unimodal. This enables us to use the causal expectation-maximisation scheme to obtain the values of causal queries in the identifiable case, and to compute bounds otherwise. Experiments demonstrate the approach to be practically viable. Theoretical convergence characterisations are provided. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/zaffalon22a.html https://proceedings.mlr.press/v186/zaffalon22a.html The Functional LiNGAM We consider a causal order such as the cause and effect among variables. In the Linear Non-Gaussian Acyclic Model (LiNGAM), we can only identify the order if at least one of the variables is non-Gaussian. This paper extends the notion of variables to functions (Functional Linear Non-Gaussian Acyclic Model, Func-LiNGAM). We first prove that we can identify the order among random functions if and only if one of them is a non-Gaussian process. In the actual procedure, we approximate the functions by random vectors. To improve the correctness and efficiency, we propose to optimize the coordinates of the vectors in such a way as functional principal component analysis. The experiments contain an order identification simulation among multiple functions for given samples. In particular, we apply the Func-LiNGAM to recognize the brain connectivity pattern with fMRI data. We can see the improvements in accuracy and execution speed compared to existing methods. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/yang22a.html https://proceedings.mlr.press/v186/yang22a.html Approximate Inference for Stochastic Planning in Factored Spaces Stochastic planning can be reduced to probabilistic inference in large discrete graphical models, but hardness of inference requires approximation schemes to be used. In this paper we argue that such applications can be disentangled along two dimensions. The first is the direction of information flow in the idealized exact optimization objective, i.e., forward vs backward inference. The second is the type of approximation used to compute this objective, e.g., Belief Propagation (BP) vs mean field variational inference (MFVI). This new categorization allows us to unify a large amount of isolated efforts in prior work explaining their connections and differences as well as potential improvements. An extensive experimental evaluation over large stochastic planning problems shows the advantage of forward BP over several algorithms based on MFVI. An analysis of practical limitations of MFVI motivates a novel algorithm, collapsed state variational inference (CSVI), which provides a tighter approximation and achieves comparable planning performance with forward BP. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/wu22a.html https://proceedings.mlr.press/v186/wu22a.html Structure learning algorithms for multidimensional continuous-time Bayesian network classifiers Learning the structure of continuous-time Bayesian networks directly from data has traditionally been performed using score-based structure learning algorithms. Only recently has a constraint-based method been proposed, proving to be more suitable under specific settings, as in modelling systems with variables having more than two states. As a result, studying diverse structure learning algorithms is essential to learn the most appropriate models according to data characteristics and task-related priorities, such as learning speed or accuracy. This article proposes several alternatives of such algorithms for learning multidimensional continuous-time Bayesian network classifiers, introducing for the first time constraint-based and hybrid algorithms for these models. Nevertheless, these contributions also apply to the simpler one-dimensional classification problem for which only score-based solutions exist in the literature. More specifically, the aforementioned constraint-based structure learning algorithm is first adapted to the supervised classification setting. Then, a novel algorithm of this kind, specifically tailored for the multidimensional classification problem, is presented to improve the learning times for the induction of multidimensional classifiers. Finally, a hybrid algorithm is introduced, attempting to combine the strengths of the score- and constraint-based approaches. Experiments with synthetic data are performed not only to validate the capabilities of the proposed algorithms but also to conduct a comparative study of the available structure learning algorithms. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/villa-blanco22a.html https://proceedings.mlr.press/v186/villa-blanco22a.html Robust Estimation of Laplacian Constrained Gaussian Graphical Models with Trimmed Non-convex Regularization The problem of discovering a structure that fits a collection of vector data is of crucial importance for a variety of applications. Such problems can be framed as Laplacian con- strained Gaussian Graphical Model inference. Existing algorithms rely on the assumption that all the available observations are drawn from the same Multivariate Gaussian dis- tribution. However, in practice it is common to find scenarios where the datasets are contaminated with a certain number of outliers. The purpose of this work is to address that problem. We propose a robust method based on Trimmed Least Squares that copes with the presence of corrupted samples. We provide statistical guarantees on the estimation error and present results on both simulated data and real-world data. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/vargas-vieyra22a.html https://proceedings.mlr.press/v186/vargas-vieyra22a.html Interpreting Time-Varying Dynamic Bayesian Networks for Earth Climate Modelling Bayesian networks tend to be considered as transparent and interpretable, but for big and dense networks they become harder to understand. This is the case of non-stationary, and more generally time-varying dynamic Bayesian networks, as the relations change over time and cannot be represented with a single template model. We introduce methods to explain how the model evolves qualitatively over time, and quantify these changes. In addition, we offer a functional implementation for time-varying dynamic Bayesian networks that includes our explainability proposals and some extensions that are targeted to simplify the networks in the specific field of climate sciences. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/valero-leal22a.html https://proceedings.mlr.press/v186/valero-leal22a.html Recursive autonomy identification-based learning of augmented naive Bayes classifiers Earlier reports have described classification accuracies of exactly learned augmented naive Bayes (ANB) classifiers. Those results indicate that a class variable with no parent has higher accuracy than those of other Bayesian network classifiers. Additionally, asymptotic estimation of the class posterior identical to that of the exactly learned Bayesian network is guaranteed to be achieved. Nevertheless, exact learning of large ANB is difficult because it entails an associated NP-hard problem that worsens as the number of variables increases. Recent reports have described that constraint-based learning methods with Bayes factor achieve larger network structures than when using traditional methods. This study proposes an efficient learning algorithm of an ANB classifier using recursive autonomy identification (RAI) with Bayes factor. A unique benefit of the proposed method is that the proposed method is guaranteed to accelerate execution of the RAI algorithm when the data follow an ANB structure. Numerical experiments were conducted to demonstrate the effectiveness of the proposed method. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/sugahara22a.html https://proceedings.mlr.press/v186/sugahara22a.html Bayesian Model Averaging of Chain Event Graphs for Robust Explanatory Modelling Chain Event Graphs (CEGs) are a widely applicable class of probabilistic graphical model that can represent context-specific independence statements and asymmetric unfoldings of events in an easily interpretable way. Existing model selection literature on CEGs has largely focused on obtaining the maximum a posteriori (MAP) CEG. However, MAP selection is well-known to ignore model uncertainty. Here, we explore the use of Bayesian model averaging over this class. We demonstrate how this approach can quantify model uncertainty and leads to more robust inference by identifying shared features across multiple high-scoring models. Because the space of possible CEGs is huge, scoring models exhaustively for model averaging in all but small problems is prohibitive. However, we provide a simple modification of an existing model selection algorithm, that samples the model space, to illustrate the efficacy of Bayesian model averaging compared to more standard MAP modelling. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/strong22a.html https://proceedings.mlr.press/v186/strong22a.html Scalable Bayesian Network Structure Learning with Splines The graph structure of a Bayesian network (BN) can be learned from data using the well-known score-and-search approach. Previous work has shown that incorporating structured representations of the conditional probability distributions (CPDs) into the score-and-search approach can improve the accuracy of the learned graph. In this paper, we present a novel approach capable of learning the graph of a BN and simultaneously modelling linear and non-linear local probabilistic relationships between variables. We achieve this by a combination of feature selection to reduce the search space for local relationships and extending the score-and-search approach to incorporate modelling the CPDs over variables as Multivariate Adaptive Regression Splines (MARS). MARS are polynomial regression models represented as piecewise spline functions. We show on a set of discrete and continuous benchmark instances that our proposed approach can improve the accuracy of the learned graph while scaling to instances with a large number of variables. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/sharma22a.html https://proceedings.mlr.press/v186/sharma22a.html Using Mixed-Effects Models to Learn Bayesian Networks from Related Data Sets We commonly assume that data are a homogeneous set of observations when learning the structure of Bayesian networks. However, they often comprise different data sets that are related but not homogeneous because they have been collected in different ways or from different populations. In a previous work, we proposed a closed-form Bayesian Hierarchical Dirichlet score for discrete data that pools information across related data sets to learn a single encompassing network structure, while taking into account the differences in their probabilistic structures. In this paper, we provide an analogous solution for learning a Bayesian network from continuous data using mixed-effects models to pool information across the related data sets. We study its structural, parametric, predictive and classification accuracy and we show that it outperforms both conditional Gaussian Bayesian networks (that do not perform any pooling) and classical Gaussian Bayesian networks (that disregard the heterogeneous nature of the data). The improvement is marked for low sample sizes and for unbalanced data sets. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/scutari22a.html https://proceedings.mlr.press/v186/scutari22a.html A Reparameterization of Mixtures of Truncated Basis Functions and its Applications Mixtures of truncated basis functions (MoTBFs) are a popular tool within the context of hybrid Bayesian networks, mainly because they are compatible with efficient probabilistic inference schemes. However, their standard parameterization allows the presence of negative mixture weights as well as non-normalized mixture terms, which prevents them from benefiting from existing likelihood-based mixture estimation methods like the EM algorithm. Furthermore, the standard parameterization does not facilitate the definition of a Bayesian framework ideally allowing conjugate analysis. In this paper we show how MoTBFs can be reparameterized applying a strategy already used in the literature for Gaussian mixture models with negative terms. We exemplify how the new parameterization is compatible with the EM algorithm and conjugate analysis. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/salmeron22a.html https://proceedings.mlr.press/v186/salmeron22a.html Model inclusion lattice of coloured Gaussian graphical models for paired data We consider the problem of learning a graphical model when the observations come from two groups sharing the same variables but, unlike the usual approach to the joint learning of graphical models, the two groups do not correspond to different populations and therefore produce dependent samples. A Gaussian graphical model for paired data may be implemented by applying the methodology developed for the family of graphical models with edge and vertex symmetries, also known as coloured graphical models. We identify a family of coloured graphical models suited for the paired data problem and investigate the structure of the corresponding model space. More specifically, we provide a comprehensive description of the lattice structure formed by this family of models under the model inclusion order. Furthermore, we give rules for the computation of the join and meet operations between models, which are useful in the exploration of the model space. These are then applied to implement a stepwise model search procedure and an application to the identification of a brain network from fMRI data is given. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/roverato22a.html https://proceedings.mlr.press/v186/roverato22a.html Knowledge transfer for learning subject-specific causal models Subject-specific causal models are appropriate for domains such as biology, medicine, and neuroscience, where the causal relations vary across the individuals of a population. However, its learning could be challenging, particularly under limited data sets. Although some works have addressed this issue, they are restricted to discovering up to Markov equivalence classes. In this work, we propose a method for the causal relations identification of subject-specific models. We hypothesized that transferring related data sets and locally performing interventions improves the causal direction identification of relations. The experimental results on true and imperfect Markov equivalence classes of synthetic causal Bayesian networks show that our method performing interventions over several subsets of the candidate parents and using related data according to their differences with targets recovers a higher number of correct oriented edges. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/rodri-guez-lopez22a.html https://proceedings.mlr.press/v186/rodri-guez-lopez22a.html Relevance for Robust Bayesian Network MAP-Explanations In the context of explainable AI, the concept of MAP-independence was recently introduced as a means for conveying the (ir)relevance of intermediate nodes for MAP computations in Bayesian networks. In this paper, we further study the concept of MAP-independence, discuss methods for finding sets of relevant nodes, and suggest ways to use these in providing users with an explanation concerning the robustness of the MAP result. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/renooij22a.html https://proceedings.mlr.press/v186/renooij22a.html On the rank of 2×2×2 probability tables Bayesian networks for real-world problems typically satisfy the property of positive monotonicity (in the context of educational testing, it is commonly assumed that answering correctly a question A increases the probability of answering correctly another question B). In this paper, we focus on the study of relations between positive monotonic influences on three-variable patterns and a family of 2×2×2 tensors. In this study, we use the Kruskal polynomial, well-known in the psychometrics community, which is equivalent to Cayley’s hyperdeterminant (homogeneous polynomial of degree 4 in the 8 entries of a 2×2×2 tensor). It is known that when the Kruskal polynomial is positive, the rank of the tensor is two. We show that when a probability table associated with three random variables obeys the positive monotonicity property, its corresponding 2×2×2 tensor has rank two. Moreover, it can be decomposed using only nonnegative tensors, which can each be given a statistical interpretation. We study two concepts of monotonicity in sets of three random variables, strong monotonicity (any two variables have a positive influence on the third one), and weak monotonicity (just one pair of variables that have a positive influence on the third one), and we give an example to show they do not coincide. Furthermore, we proved that the strong monotonicity property implies that the tensor rank is at most two. We also performed experiments with real data to test the monotonicity properties. The real datasets were formed by information from the Czech high school final exam from the years 2016 to 2022. These datasets are representative since the sample size (number of students) for each year is very large ($N > 10000$) and information comes from students of all regions of the Czech Republic. In this datasets, we observed that almost all 2×2×2 tensors are monotone and all their corresponding 2×2×2 tensors have nonnegative decomposition. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/perez22a.html https://proceedings.mlr.press/v186/perez22a.html Anytime Learning of Sum-Product and Sum-Product-Max Networks Prominent algorithms for learning sum-product networks (SPN) and sum-product-max networks (SPMN) focus on learning models from data that deliver good modeling performance without regard to the size of the learned network. Consequently, the learned networks can get very large, which negatively impacts inference time. In this paper, we introduce anytime algorithms for learning SPNs and SPMNs. These algorithms generate intermediate but provably valid models whose performance progressively improves as more time and computational resources are allocated to the learning. They flexibly trade off good model performance with reduced learning time, offering the benefit that SPNs and SPMNs of small sizes (but with reduced likelihoods) can be learned quickly. We comprehensively evaluate the anytime algorithms on two testbeds and demonstrate that the network performance improves with time and reflects the expected performance profile of an anytime algorithm. We expect these anytime algorithms to become the default learning techniques for SPNs and SPMNs given their clear benefit over classical batch learning techniques. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/pawar22a.html https://proceedings.mlr.press/v186/pawar22a.html Graphical Representations for Algebraic Constraints of Linear Structural Equations Models The observational characteristics of a linear structural equation model can be effectively described by polynomial constraints on the observed covariance matrix. However, these polynomials can be exponentially large, making them impractical for many purposes. In this paper, we present a graphical notation for many of these polynomial constraints. The expressive power of this notation is investigated both theoretically and empirically. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/ommen22a.html https://proceedings.mlr.press/v186/ommen22a.html Causal Discovery and Reinforcement Learning: A Synergistic Integration Both Reinforcement Learning (RL) and Causal Modeling (CM) are indispensable parts in the road for general artificial intelligence, however, they are usually treated separately, despite the fact that both areas can effectively complement each other in problem solving. On one hand, the interventional nature of the data generating process in RL favors the discovery of the underlying causal structure. On the other hand, if an agent knows the possible consequences of its actions, given by causal models, it can make better selections of them, reducing exploration and, therefore, accelerating the learning process. Also, ensuring that such an agent maintains a causal model for the world it operates in, improves interpretability and transfer learning, among other benefits. In this article, we propose a combination strategy to provide an intelligent agent with the ability to simultaneously learn and use causal models in the context of reinforcement learning. The proposed method learns a Causal Dynamic Bayesian Network for each of the agent actions and uses those models to improve the action selection process. To test our algorithm, experiments were performed on a simple synthetic scenario called the “coffee-task". Our method achieves better results in policy learning than a traditional model-free algorithm (Q-Learning), and it also learns the underlying causal models. We believe that the results obtained reveal several interesting and challenging directions for future work. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/mendez-molina22a.html https://proceedings.mlr.press/v186/mendez-molina22a.html A Transformational Characterization of Unconditionally Equivalent Bayesian Networks We consider the problem of characterizing Bayesian networks up to unconditional equivalence, i.e., when directed acyclic graphs (DAGs) have the same set of unconditional {$d$}-separation statements. Each unconditional equivalence class (UEC) is uniquely represented with an undirected graph whose clique structure encodes the members of the class. Via this structure, we provide a transformational characterization of unconditional equivalence; i.e., we show that two DAGs are in the same UEC if and only if one can be transformed into the other via a finite sequence of specified moves. We also extend this characterization to the essential graphs representing the Markov equivalence classes (MECs) in the UEC. UECs form a partition coarsening of the space of MECs and are easily estimable from marginal independence tests. Thus, a characterization of unconditional equivalence has applications in methods that involve searching the space of MECs of Bayesian networks. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/markham22a.html https://proceedings.mlr.press/v186/markham22a.html A Decision Support System to Predict Acute Fish Toxicity We present a decision support system using a Bayesian network to predict acute fish toxicity from multiple lines of evidence. Fish embryo toxicity testing has been proposed as an alternative to using juvenile or adult fish in acute toxicity testing for hazard assessments of chemicals. The European Chemicals Agency has recommended the development of a so-called weight-of-evidence approach for strengthening the evidence from fish embryo toxicity testing. While weight-of-evidence approaches in the ecotoxicology and ecological risk assessment community in the past have been largely qualitative, we have developed a Bayesian network for using fish embryo toxicity data in a quantitative approach. The system enables users to efficiently predict the potential toxicity of a chemical substance based on multiple types of evidence including physical and chemical properties, quantitative structure-activity relationships, toxicity to algae and daphnids, and fish gill cytotoxicity. The system is demonstrated on three chemical substances of different levels of toxicity. It is considered as a promising step towards a probabilistic weight-of-evidence approach to predict acute fish toxicity from fish embryo toxicity. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/madsen22b.html https://proceedings.mlr.press/v186/madsen22b.html Online Updating of Conditional Linear Gaussian Bayesian Networks This paper presents a method for online updating of conditional distributions in Bayesian network models with both discrete and continuous variables. The method extends known procedures for updating discrete conditional probability distributions with techniques to cope with conditional Gaussian density functions. The method has a solid foundation for known cases and may be generalised by a heuristic scheme for fractional updating when discrete parents are not known. A fading mechanism is described to prevent the system being too conservative as cases accumulate over long time periods. The effect of the online updating is illustrated by an application to predict the number of waiting patients at the emergency department at Aalborg University Hospital. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/madsen22a.html https://proceedings.mlr.press/v186/madsen22a.html A Hardware Perspective to Evaluating Probabilistic Circuits The always-increasing development of AI-enhanced Internet-of-Things devices has recently pushed the need for on-device computation of AI models. As these tasks require mak- ing robust predictions under uncertainty, probabilistic (graphical) models have recently gained interest also for these applications. However, embedded computation requires high computational efficiency (i.e., high speed and low power) through hardware acceleration. Although the acceleration of deep learning models has shown extensive benefits, this has not translated to probabilistic models as of yet. Probabilistic circuits (PCs), a family of tractable probabilistic models, allow a direct hardware view as they are represented in the form of a computational graph. Over the years, various approaches for structure learning of PCs have been proposed, however, without consideration of their potential hardware cost. In this work, we propose to take a hardware perspective in the evaluation of PC structures. We compare several structure learning strategies, associating each PC with hardware costs (computation power, speed, efficiency), and evaluate which one leads to more hardware- friendly implementations. Our results show that models imposing additional structural constraints on the PC are competitive models in terms of performance while being gen- erally more hardware-efficient, making them suitable candidates for energy-constrained applications. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/leslin22a.html https://proceedings.mlr.press/v186/leslin22a.html Highly Efficient Structural Learning of Sparse Staged Trees Several structural learning algorithms for staged tree models, an asymmetric extension of Bayesian networks, have been defined. However, they do not scale efficiently as the number of variables considered increases. Here we introduce the first scalable structural learning algorithm for staged trees, which searches over a space of models where only a small number of dependencies can be imposed. A simulation study as well as a real-world application illustrate our routines and the practical use of such data-learned staged trees. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/leonelli22a.html https://proceedings.mlr.press/v186/leonelli22a.html Speeding up approximate MAP by applying domain knowledge about relevant variables The MAP problem in Bayesian networks is notoriously intractable, even when approximated. In an earlier paper we introduced the Most Frugal Explanation heuristic approach to solving MAP, by partitioning the set of intermediate variables (neither observed nor part of the MAP variables) into a set of relevant variables, which are marginalized out, and irrelevant variables, which will be assigned a sampled value from their domain. In this study we explore whether knowledge about which variables are relevant for a particular query (i.e., domain knowledge) speeds up computation sufficiently to beat both exact MAP as well as approximate MAP while giving reasonably accurate results. Our results are inconclusive, but also show that this probably depends on the specifics of the MAP query, most prominently the number of MAP variables. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/kwisthout22a.html https://proceedings.mlr.press/v186/kwisthout22a.html Learning Noisy-Or Networks with an Application in Linguistics In this paper we discuss the issue of learning Bayesian networks whose conditional probability tables (CPTs) are either noisy-or models or general CPTs. We refer to these models as Mixed Noisy-Or Bayesian Networks. In order to learn the structure of such Bayesian networks we modify the Bayesian Information Criteria (BIC) used for general Bayesian networks so that it reflects the number of parameters of a noisy-or model. We prove the log-likelihood function of a noisy-or model has a unique maximum and adapt the EM-learning method for leaky noisy-or models. We evaluate the proposed approach on synthetic data where it performs substantially better than general BNs. We apply this approach also to a problem from the domain of linguistics. We use Mixed Noisy-Or Bayesian Networks to model spread of loanwords in the South-East Asia Archipelago. We perform numerical experiments in which we compare prediction ability of general Bayesian Networks with Mixed Noisy-Or Bayesian Networks. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/kratochvil22a.html https://proceedings.mlr.press/v186/kratochvil22a.html Explaining Deep Tractable Probabilistic Models: The sum-product network case We consider the problem of explaining a class of tractable deep probabilistic models, the Sum-Product Networks (SPNs) and present an algorithm ExSPN to generate explanations. To this effect, we define the notion of a context-specific independence tree(CSI-tree) and present an iterative algorithm that converts an SPN to a CSI-tree. The resulting CSI-tree is both interpretable and explainable to the domain expert. We achieve this by extracting the conditional independencies encoded by the SPN and approximating the local context specified by the structure of the SPN. Our extensive empirical evaluations on synthetic, standard, and real-world clinical data sets demonstrate that the CSI-tree exhibits superior explainability. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/karanam22a.html https://proceedings.mlr.press/v186/karanam22a.html A Hybrid Algorithm for Learning Causal Networks using Uncertain Experts’ Knowledge Bayesian networks (BN) have become one of the most popular frameworks in causal studies. The causal relations between variables are encoded by the structure of the model, which is a directed acyclic graph (DAG). Unfortunately, despite the significant advances in algorithm development, learning the causal structure from data remains a very challenging task, especially for cases with a large number of variables. When the learning algorithm fails to identify the causal orientation of some edges, the human expert can provide some rough guidelines to complete the causal discovery. In many application domains, the expert knowledge might be uncertain about the right orientation of the edge. Worst, it may contradict the orientations learned from observational data, hence leading to conflicting situations. This paper presents a new hybrid algorithm combining a constraint-based approach with a greedy search, that includes specific rules to cope with uncertain domain/expert knowledge at different steps of the learning process. Experiments show the robustness of our method compared to other state-of-the-art algorithms. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/gonzales22a.html https://proceedings.mlr.press/v186/gonzales22a.html The Dual PC Algorithm for Structure Learning Learning the graphical structure of Bayesian networks is key to describing data generating mechanisms in many complex applications and it poses considerable computational challenges. Observational data can only identify the equivalence class of the directed acyclic graph underlying a Bayesian network model, and a variety of methods exist to tackle the problem. Under certain assumptions, the popular PC algorithm can consistently recover the correct equivalence class by reverse-engineering the conditional independence (CI) relationships holding in the variable distribution. Here, we propose the dual PC algorithm, a novel scheme to carry out the CI tests within the PC algorithm by leveraging the inverse relationship between covariance and precision matrices. By exploiting block matrix inversions we can efficiently supplement partial correlation tests at each step with those of complementary (or dual) conditioning sets. The multiple CI tests of the dual PC algorithm proceed by first considering marginal and full-order CI relationships and progressively moving to central-order ones. Simulation studies show that the dual PC algorithm outperforms the classic PC algorithm both in terms of run time and in recovering the underlying network structure, even in the presence of deviations from Gaussianity. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/giudice22a.html https://proceedings.mlr.press/v186/giudice22a.html Convergence of Feedback Arc Set-Based Heuristics for Linear Structural Equation Models Score-based structure learning in Bayesian networks, where local structures in the graph are given a score and one seeks to recover a high-scoring DAG from data, is an NP-hard problem. While the general learning problem is combinatorial, the more restricted framework of linear structural equation models (SEMs) enables learning Bayesian networks using continuous optimization methods. Large scale structure learning has become an important problem in linear SEMs and many approximate methods have been developed to address it. Among them, feedback arc set-based methods learn the DAG by alternating between unconstrained gradient descent-based step to optimize an objective function and solving a maximum acyclic subgraph problem to enforce acyclicity. In the present work, we build upon previous contributions on such heuristics by first establishing mathematical convergence analysis, previously lacking; second, we show empirically how one can significantly speed-up convergence in practice using simple warmstarting strategies. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/gillot22a.html https://proceedings.mlr.press/v186/gillot22a.html Who did it? Identifying the Most Likely Origins of Events One probabilistic inference task concerns answering queries for conditional marginal distributions, where a set of events is given. In this paper, we investigate the problem of only knowing that events are observed, from a number of sensors or for individuals, but not which sensors or individuals exhibit those events specifically. This situation might occur in multi-agent settings, such as in nanosystems, where single agents can no longer be tracked. However, to be able to perform probabilistic inference, those events need to be mapped to random variables, specifically to those that are most likely to exhibit those events. For the mapping, we show how lifting allows for generating all different possibilities to map those events, as we can do it over sets of indistinguishable random variables, leading to a set of queries. Given the mapping that leads to the most likely answer, we can construct evidence to perform probabilistic inference with. Finally, we compare solving the problem on the propositional level, which cannot be done in reasonable time, to our approach, which returns liftable evidence for tractable inference. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/gehrke22a.html https://proceedings.mlr.press/v186/gehrke22a.html Online Single-Microphone Source Separation using Non-Linear Autoregressive Models In this paper a modular approach to single-microphone source separation is proposed. A probabilistic model for mixtures of observations is constructed, where the independent underlying source signals are described by non-linear autoregressive models. Source separation in this model is achieved by performing online probabilistic inference through an efficient message passing procedure. For retaining tractability with the non-linear autoregressive models, three different approximation methods are described. A set of experiments shows the effectiveness of the proposed source separation approach. The source separation performance of the different approximation methods is quantified through a set of verification experiments. Our approach is validated in a speech denoising task. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/erp22a.html https://proceedings.mlr.press/v186/erp22a.html Limited Memory Influence Diagrams for Attribute Statistical Process Control with Variable Sample Sizes Limited Memory Influence Diagrams (LIMIDs) are implemented for statistical process control (SPC) to monitor the quality of the output from a production process where the number of defective units in a sample is measured at each time period. The observed defectives provide the input to a decision on whether to stop the process and repair a problematic cause of variation. The model also allows the decision maker to increase the size of the next sample in order to better discern whether or not the process actually requires investigation. The model only requires the user to know the size and result of the current sample to make a decision, in contrast to Bayesian methods that require calculations based on all prior samples and a history of actions. Despite the limited information, the model provides competitive quality costs to existing methods for a wide range of production time horizons. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/cobb22a.html https://proceedings.mlr.press/v186/cobb22a.html Discovery and density estimation of latent confounders in Bayesian networks with evidence lower bound Discovering and parameterising latent confounders represent important and challenging problems in causal structure learning and density estimation respectively. In this paper, we focus on both discovering and learning the distribution of latent confounders. This task requires solutions that come from different areas of statistics and machine learning. We combine elements of variational Bayesian methods, expectation-maximisation, hill-climbing search, and structure learning under the assumption of causal insufficiency. We propose two learning strategies; one that maximises model selection accuracy, and another that improves computational efficiency in exchange for minor reductions in accuracy. The former strategy is suitable for small networks and the latter for moderate size networks. Both learning strategies perform well relative to existing solutions. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/chobtham22a.html https://proceedings.mlr.press/v186/chobtham22a.html Evolutive Adversarially-Trained Bayesian Network Autoencoder for Interpretable Anomaly Detection Semi-supervised detection of outliers with only positive and unlabeled data, which is among the most frequent forms of the anomaly detection (AD) problem in real scenarios, requires for a model to capture the normal behaviour of data from a training set exclusively comprised of normal-labelled data, so new unseen data can be afterwards compared to the induced notion of normality to be flagged -or not- as anomalous. In modelling a certain pattern of behaviour, generative models such as generative-adversarial networks (GANs) have proved to have great performance. Thus, numerous AD algorithms with GANs at its core have been proposed, most of them powered by deep neural networks and relying on an autoencoder for the AD task. In the present work, a novel approach to semi-supervised AD with Bayesian networks using generative-adversarial training and an evolutive strategy is proposed, which aims to palliate the intrinsic lack of interpretability of deep neural networks. The proposed model is tested on a real-world AD problem in cybersecurity. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/casajus-setien22a.html https://proceedings.mlr.press/v186/casajus-setien22a.html Parameterized Completeness Results for Bayesian Inference We present completeness results for inference in Bayesian networks with respect to two different parameterizations, namely the number of variables and the topological vertex separation number. For this we introduce the parameterized complexity classes $\mathsf{W[1]PP}$ and $\mathsf{XLPP}$, which relate to $\mathsf{W[1]}$ and $\mathsf{XNLP}$ respectively as $\mathsf{PP}$ does to $\mathsf{NP}$. The second parameter is intended as a natural translation of the notion of pathwidth to the case of directed acyclic graphs, and as such it is a stronger parameter than the more commonly considered treewidth. Based on a recent conjecture, the completeness results for this parameter suggest that deterministic algorithms for inference require exponential space in terms of pathwidth and by extension treewidth. These results are intended to contribute towards a more precise understanding of the parameterized complexity of Bayesian inference and thus of its required computational resources in terms of both time and space. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/bodlaender22a.html https://proceedings.mlr.press/v186/bodlaender22a.html You Only Derive Once (YODO): Automatic Differentiation for Efficient Sensitivity Analysis in Bayesian Networks Sensitivity analysis measures the influence of a Bayesian network’s parameters on a quantity of interest defined by the network, such as the probability of a variable taking a specific value. In particular, the so-called sensitivity value measures the quantity of interest’s partial derivative with respect to the network’s conditional probabilities. However, finding such values in large networks with thousands of parameters can become computationally very expensive. We propose to use automatic differentiation combined with exact inference to obtain all sensitivity values in a single pass. Our method first marginalizes the whole network once using e.g. variable elimination and then backpropagates this operation to obtain the gradient with respect to all input parameters. We demonstrate our routines by ranking all parameters by importance on a Bayesian network modeling humanitarian crises and disasters, and then show the method’s efficiency by scaling it to huge networks with up to 100’000 parameters. An implementation of the methods using the popular machine learning library PyTorch is freely available. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/ballester-ripoll22a.html https://proceedings.mlr.press/v186/ballester-ripoll22a.html Integrating Bayesian network classifiers to deal with the partial label ranking problem The label ranking problem consists in learning preference models from training datasets labeled with a (possibly incomplete) ranking of the class labels, and the goal is to predict a ranking for a given unlabeled instance. In this work, we focus on the particular case where the training dataset and the prediction given as output allow tied class labels (i.e., there is no particular preference among them), known as the partial label ranking problem. This paper transforms the ranking with ties into discrete variables representing the preference relations (precedes, ties, and succeeds) among pairs of class labels. We then use Bayesian network classifiers to model the pairwise preferences. Finally, we input the posterior probabilities into the pair order matrix used to solve the corresponding rank aggregation problem at inference time. The experimental evaluation shows that our proposals are competitive in accuracy with the state-of-the-art mixture-based probabilistic graphical models while being much faster. Mon, 19 Sep 2022 00:00:00 +0000 https://proceedings.mlr.press/v186/alfaro22a.html https://proceedings.mlr.press/v186/alfaro22a.html