Proceedings of Machine Learning ResearchProceedings of The 3rd International Workshop on Advanced Methodologies for Bayesian Networks on 20-22 September 2017
Published as Volume 73 by the Proceedings of Machine Learning Research on 03 September 2017.
Volume Edited by:
Antti Hyttinen
Joe Suzuki
Brandon Malone
Series Editors:
Neil D. Lawrence
Mark Reid
http://proceedings.mlr.press/v73/
Thu, 04 Oct 2018 19:28:44 +0000Thu, 04 Oct 2018 19:28:44 +0000Jekyll v3.7.4Multiple DAGs Learning with Non-negative Matrix FactorizationProbabilistic graphical models, e.g., Markov network and Bayesian network have been well studied in the past two decades. However, it is still difficult to learn a reliable network structure, especially with limited data. Recent works found multi-task learning can improve the robustness of the learned networks by leveraging data from related tasks. In this paper, we focus on the estimation of Direct Acyclic Graph (DAG) of Bayesian network. Most existing multi-task or transfer learning algorithms for Bayesian network use the DAG relatedness as an inductive bias in the optimization of multiple structures. More specifically, some works firstly find shared hidden structures among related tasks, and then treat them as the structure penalties in the learning step. However, current works omit the setting that the shared hidden structure comes from different parts of different DAGs. Thus, in this paper, the Non-negative Matrix Factorization (NMF) is employed to learn a parts-based representation to mediate this problem. Theoretically, we show the plausibility of our approach. Empirically, we show that compared to single task learning, multi-task learning is better able to positively identify true edges with synthetic data and real-world landmine data. Sun, 03 Sep 2017 00:00:00 +0000
http://proceedings.mlr.press/v73/zhou17a.html
http://proceedings.mlr.press/v73/zhou17a.htmlCausal Learning and Machine LearningCan we find the causal direction between two variables? How can we make optimal predictions
in the presence of distribution shift? We are often faced with such causal modeling
or prediction problems. Recently, with the rapid accumulation of huge volumes of data,
both causal discovery, i.e., learning causal information from purely observational data, and
machine learning are seeing exciting opportunities as well as great challenges. This talk will
be focused on recent advances in causal discovery and how causal information facilitates
understanding and solving certain problems of learning from heterogeneous data. In particular,
I will talk about basic approaches to causal discovery and address practical issues
in causal discovery, including nonstationarity or heterogeneity of the data and existence of
measurement error. Finally, I will discuss why and how underlying causal knowledge helps
in learning from heterogeneous data when the i.i.d. assumption is dropped, with transfer
learning? as a particular example.
Sun, 03 Sep 2017 00:00:00 +0000
http://proceedings.mlr.press/v73/zhang17a.html
http://proceedings.mlr.press/v73/zhang17a.htmlHidden Node Detection between Two Observable Nodes Based on Bayesian ClusteringThe structure learning is one of the main concerns in studies of the Bayesian networks.
In the present paper, we consider the network consisting of both observable and hidden nodes,
and propose a method to investigate the existence of a hidden node between two observable nodes,
which is the model selection problem between the networks with and without the middle hidden node.
When the network includes a hidden node, it has been known that there are singularities in the parameter space,
and the Fisher information matrix is not positive definite.
Then, the many conventional criteria for the structure learning based on the Laplace approximation do not work.
The proposed method is based on the Bayesian clustering,
and its asymptotic property justifies the result; the redundant labels are eliminated and the simplest structure is detected
even if there are singularities.Sun, 03 Sep 2017 00:00:00 +0000
http://proceedings.mlr.press/v73/yamazaki17a.html
http://proceedings.mlr.press/v73/yamazaki17a.htmlFast Compilation of s-t Paths on a Graph for Counting and EnumerationIn this paper, we propose a new method to compile $s$-$t$ simple paths on a graph using a new compilation method called merging frontier based search. Recently, Nishino et al. proposed a top-down construction algorithm, which compiles $s$-$t$ simple paths into a Zero-suppressed SDD (ZSDD), and they showed that this method is more efficient than simpath by Knuth.
However, since the method of Nishino et al. uses ZSDD as a tractable representation, it requires complicated steps for compilation. In this paper, we propose z-st-d-DNNF, which is a super set of ZSDD. By using this method instead of ZSDD, we show that more efficient $s$-$t$ simple paths compilation can be realized. Sun, 03 Sep 2017 00:00:00 +0000
http://proceedings.mlr.press/v73/teruji-sugaya17a.html
http://proceedings.mlr.press/v73/teruji-sugaya17a.htmlRestricted Quasi Bayesian Networks as a Prototyping Tool for Computational Models of Individual Cortical AreasWe propose \textit{restricted quasi Bayesian networks} as an efficient prototyping tool for designing computational models of individual cortical areas of the brain.
Restricted quasi Bayesian networks are simplified Bayesian networks
that only distinguish probability value 0 from other values.
Using our tool, it is possible to concentrate on the essential part of model design and efficiently construct prototypes.
We demonstrate that restricted quasi Bayesian networks actually work well as a prototyping tool by implementing a syntactic parser for an ambiguous English sentence.Sun, 03 Sep 2017 00:00:00 +0000
http://proceedings.mlr.press/v73/takahashi17a.html
http://proceedings.mlr.press/v73/takahashi17a.htmlAdvanced Methodologies for Bayesian Networks 2017: PrefacePrefaceSun, 03 Sep 2017 00:00:00 +0000
http://proceedings.mlr.press/v73/suzuki17a.html
http://proceedings.mlr.press/v73/suzuki17a.htmlHyperparameter sensitivity revisitedThe BDeu scoring criterion for learning Bayesian network structures is known to be very
sensitive to the equivalent sample size hyper-parameter. Recently some authors have suggested
alternative Bayesian scoring criteria that appear to behave better than BDeu. So
is the problem solved? We will review the problem and suggested solutions and present
empirical assessment of the current situation.
Sun, 03 Sep 2017 00:00:00 +0000
http://proceedings.mlr.press/v73/silander17a.html
http://proceedings.mlr.press/v73/silander17a.htmlFast Message Passing Algorithm Using ZDD-Based Local Structure CompilationCompiling Bayesian Networks (BNs) into secondary structures to implement efficient exact inference is a hot topic in probabilistic modeling.
One class of algorithms to compile BNs is to transform the BNs into junction tree structures utilizing the conditional dependency in the network.
Performing message passing on the junction tree structure, we can calculate marginal probabilities for any variables in the network efficiently.
However, the message passing algorithm does not consider the local structure in the network. Since the ability to exploit local structure to avoid redundant calculations has a significant impact on exact inference,
in this article, we propose a fast message passing algorithm by exploiting local structure using Zero-suppressed Binary Decision Diagrams (ZDDs).
We convert all the components used in message passing algorithm into Multi-linear Functions (MLFs), and then compile them into compact representation using ZDDs.
We show that message passing on ZDDs can work more efficient than the conventional message passing algorithm on junction tree structures on some benchmark networks although it may be too memory consuming for some larger instances.
Sun, 03 Sep 2017 00:00:00 +0000
http://proceedings.mlr.press/v73/shan-gao17a.html
http://proceedings.mlr.press/v73/shan-gao17a.htmlDirichlet Bayesian Network Scores and the Maximum Entropy PrincipleA classic approach for learning Bayesian networks from data is to select the
\emph{maximum a posteriori} (MAP) network. In the case of discrete Bayesian
networks, the MAP network is selected by maximising one of several possible
Bayesian Dirichlet (BD) scores; the most famous is the \emph{Bayesian
Dirichlet equivalent uniform} (BDeu) score from Heckerman \emph{et al.} (1995). The key
properties of BDeu arise from its underlying uniform prior, which makes
structure learning computationally efficient; does not require the elicitation
of prior knowledge from experts; and satisfies score equivalence.
In this paper we will discuss the impact of this uniform prior on structure
learning from an information theoretic perspective, showing how BDeu may
violate the maximum entropy principle when applied to sparse data and how it
may also be problematic from a Bayesian model selection perspective. On the
other hand, the BDs score proposed in Scutari (2016) arises from a piecewise
prior and it does not appear to violate the maximum entropy principle, even
though it is asymptotically equivalent to BDeu.Sun, 03 Sep 2017 00:00:00 +0000
http://proceedings.mlr.press/v73/scutari17a.html
http://proceedings.mlr.press/v73/scutari17a.htmlImproved Local Search in Bayesian Networks Structure LearningWe present a novel approach for score-based structure learning of Bayesian network, which couples an existing ordering-based algorithm for structure optimization with a novel operator for exploring the neighborhood of a given order in the space of the orderings. Our approach achieves state-of-the-art performances in data sets containing thousands of variables. Sun, 03 Sep 2017 00:00:00 +0000
http://proceedings.mlr.press/v73/scanagatta17a.html
http://proceedings.mlr.press/v73/scanagatta17a.htmlLearning probability by comparisonLearning probability by probabilistic modeling is a major task in statistical machine learning
and it has traditionally been supported by maximum likelihood estimation applied to
generative models or by a local maximizer applied to discriminative models. In this talk, we
introduce a third approach, an innovative one that learns probability by comparing probabilistic
events. In our approach, we give the ranking of probabilistic events and the system
learns a probability distribution so that the ranking is well respected. We implemented
this approach in PRISM, a logic-based probabilistic programming language, and conducted
learning experiments with real data for models described by PRISM programs.Sun, 03 Sep 2017 00:00:00 +0000
http://proceedings.mlr.press/v73/sato17a.html
http://proceedings.mlr.press/v73/sato17a.htmlLearning Causal AMP Chain GraphsAndersson-Madigan-Perlman chain graphs were originally introduced to represent independence models. They have recently been shown to be suitable for representing causal models with additive noise. In this paper, we present an algorithm for learning causal chain graphs. The algorithm builds on the ideas by \citet{Hoyeretal.2009}, i.e. it exploits the nonlinearities in the data to identify the direction of the causal relationships. We also report experimental results on real-world data.Sun, 03 Sep 2017 00:00:00 +0000
http://proceedings.mlr.press/v73/pena17b.html
http://proceedings.mlr.press/v73/pena17b.htmlCausal Effect Identification in Alternative Acyclic Directed Mixed GraphsAlternative acyclic directed mixed graphs (ADMGs) are graphs that may allow causal effect identification in scenarios where Pearl's original ADMGs may not, and vice versa. Therefore, they complement each other. In this paper, we introduce a sound algorithm for identifying arbitrary causal effects from alternative ADMGs. Moreover, we show that the algorithm is complete for identifying the causal effect of a single random variable on the rest. We also show that the algorithm follows from a calculus similar to Pearl's <i>do</i>-calculus. Sun, 03 Sep 2017 00:00:00 +0000
http://proceedings.mlr.press/v73/pena17a.html
http://proceedings.mlr.press/v73/pena17a.htmlConsistent Learning Bayesian Networks with Thousands of VariablesWe have already proposed a constraint-based learning Bayesian network method using Bayes factor. Since a conditional independence test using Bayes factor has consistency, the learning method improves the learning accuracy of the traditional constraint-based learning methods. Additionally, the method is expected to learn larger network structures than the traditional methods do because it greatly improves computational efficiency. However, its expected benefits have not been demonstrated empirically. This report describes some experiments related to the learning of large network structures. Results show that the proposed method can learn surprisingly huge networks with thousands of variables. Sun, 03 Sep 2017 00:00:00 +0000
http://proceedings.mlr.press/v73/natori17a.html
http://proceedings.mlr.press/v73/natori17a.htmlAn Experimental Analysis of Anytime Algorithms for Bayesian Network Structure LearningBayesian networks are a widely used graphical model with diverse applications in knowledge discovery, classification,
and decision making. Learning a Bayesian network from discrete data can be cast as a combinatorial optimization problem and
thus solved using optimization techniques---the well-known \emph{score-and-search} approach. An important consideration
when applying a score-and-search method for Bayesian network structure learning (BNSL) is its anytime behavior; i.e., how
does the quality of the solution found improve as a function of the amount of time given to the algorithm. Previous studies
of the anytime behavior of methods for BNSL are limited by the scale of the instances used in the evaluation and evaluate
only algorithms that do not scale to larger instances. In this paper, we perform an extensive evaluation of the
anytime behavior of the current state-of-the-art algorithms for BNSL. Our benchmark instances range from small (instances
with fewer than 20 random variables) to massive (instances with more than 1,500 random variables). We find that a local search
algorithm based on memetic search dominates the performance of other state-of-the-art algorithms when considering anytime behavior.Sun, 03 Sep 2017 00:00:00 +0000
http://proceedings.mlr.press/v73/lee17a.html
http://proceedings.mlr.press/v73/lee17a.htmlOn the Sizes of Decision Diagrams Representing the Set of All Parse Trees of a Context-free Grammar In this paper, we analyze the size of decision diagrams (DD) representing the set of all parse trees of a context-free grammar (CFG).
CFG is widely used in the field of natural language processing and bioinformatics to estimate the hidden structures of sequence data.
A decision diagram is a data structure that represents a Boolean function in a concise form. By using DDs to represent the set of all parse trees, we can efficiently perform many useful operations over the parse trees, such as finding trees that satisfy additional constraints and finding the best parse tree.
Since the time complexity of these operations depends on DD size, selecting an appropriate DD variant is important.
Experiments on a simple CFG show that the Zero-suppressed Sentential Decision Diagram (ZSDD) is better than other DDs; we also give theoretical upper bounds on ZSDD size.Sun, 03 Sep 2017 00:00:00 +0000
http://proceedings.mlr.press/v73/kei-amii17a.html
http://proceedings.mlr.press/v73/kei-amii17a.htmlFew-to-few Cross-domain Object Matching Cross-domain object matching refers to the task of inferring unknown
alignment between objects in two data collections that do not have a
shared data representation. In recent years several methods have
been proposed for solving the special case that assumes each object
is to be paired with exactly one object, resulting
in a constrained optimization problem over permutations. A related
problem formulation of cluster matching seeks to match a cluster of
objects in one data set to a cluster of objects in the other set,
which can be considered as many-to-many extension of cross-domain
object matching and can be solved without explicit constraints. In
this work we study the intermediate region between these two special
cases, presenting a range of Bayesian inference algorithms that work
also for few-to-few cross-domain object matching problems where
constrained optimization is necessary but the optimization domain is
broader than just permutations. Sun, 03 Sep 2017 00:00:00 +0000
http://proceedings.mlr.press/v73/jitta17a.html
http://proceedings.mlr.press/v73/jitta17a.htmlAnalyzing Tandem Mass Spectra: A Graphical Models PerspectiveIn the past two decades, the field of proteomics has seen explosive growth, largely due
to the development of tandem mass spectrometry (MS/MS). With a complex biological
sample as input, a typical MS/MS experiment quickly produces a large (often numbering
in the hundreds-of-thousands) collection of spectra representative of the proteins present
in the original complex sample. A majority of widely used methods to search and identify
MS/MS spectra use scoring functions which rely on static, hand-selected parameters rather
than affording the ability to learn parameters and adapt to the widely varying characteristics
of MS/MS data. In this talk, we discuss recent work utilizing dynamic Bayesian networks
(DBNs) to identify MS/MS spectra. In particular, we discuss a recently proposed DBN for
Rapid Identification of Peptides (DRIP) which, in contrast to popular scoring functions,
allows efficient generative and discriminative learning of parameters to achieve state-of-theart
spectrum-identification accuracy. Furthermore, facilitated by DRIPâ€™s generative nature,
we present current innovations leveraging DBNs to significantly enhance many other aspects
of MS/MS analysis, such as improving downstream discriminative classification via detailed
feature extraction and speeding up identification runtime using trellises and approximate
inference.
Sun, 03 Sep 2017 00:00:00 +0000
http://proceedings.mlr.press/v73/halloran17a.html
http://proceedings.mlr.press/v73/halloran17a.htmlLearning Bayesian Network Parameters with Domain Knowledge and Insufficient DataTo improve the learning accuracy of parameters in a Bayesian network (BN) from limited
data, domain knowledge is often incorporated into the learning process as parameter con-
straints. Maximum a posteriori (MAP) based methods that use both data and constraints
have been studied extensively. Among those methods, the qualitatively maximum a pos-
teriori (QMAP) method exhibits high learning performance. In the QMAP method, when
the data are limited, estimation from the data often fails to satisfy all the parameter con-
straints, which makes the overall QMAP estimation unreliable. To ensure that the QMAP
estimation does not violate any given parameter constraint and further improve the learn-
ing accuracy, in this paper, we propose a qualitatively maximum a posteriori correction
(QMAP-C) estimation algorithm, which regulates QMAP estimation by replacing the data
estimation with a further constrained estimation. Experiments show that the proposed al-
gorithm outperforms most of the existing parameter learning methods when the parameter
constraints are correct.Sun, 03 Sep 2017 00:00:00 +0000
http://proceedings.mlr.press/v73/guo17a.html
http://proceedings.mlr.press/v73/guo17a.htmlReducing the Cost of Probabilistic Knowledge CompilationBayesian networks (BN) are a popular representation for reasoning under uncertainty. The computational complexity of inference, however, hinders its applicability to many real-world domains that in principle can be modeled by BNs. Inference methods based on <i>Weighted Model Counting</i> (WMC) reduce the cost of inference by exploiting patterns exhibited by the probabilities associated with BN nodes. However, these methods require a computationally intensive compilation step in search of these patterns, limiting the number of BNs that are eligible based on their size. In this paper, we aim to extend WMC methods in general by proposing a scalable, compilation framework that is language agnostic, which solves this problem by partitioning BNs and compiling them as a set of smaller sub-problems. This reduces the cost of compilation and allows state-of-the-art innovations in WMC to be applied to a much larger range of Bayesian networks.Sun, 03 Sep 2017 00:00:00 +0000
http://proceedings.mlr.press/v73/giso-dal17a.html
http://proceedings.mlr.press/v73/giso-dal17a.htmlIncorporating Uncertain Evidence Into Arithmetic Circuits Representing Probability DistributionsArithmetic circuits have been used as tractable representations of probability distributions, either generated from models such as Bayesian networks, sum-product networks and Probability Sentential Decision Diagrams, or directly from data. An interesting question is how we can incorporate uncertain evidence, which specifies that the marginal probabilities of a variable has to undergo certain changes, directly into an arithmetic circuit and then perform reasoning on it to compute the probability distribution after incorporating this uncertain evidence. In this paper, we show that we can incorporate uncertain evidence on a variable by setting indicators of this variable in the arithmetic circuit to non-negative values based on the likelihood ratios in Pearl's method of virtual evidence and the current marginal probabilities of this variable. For tractable computation of these marginal probabilities, the arithmetic circuit has to satisfy the properties of decomposability and smoothness, and we show that an algorithm using a downward pass can compute these marginal probabilities for all single variables. We show a procedure of how to incorporate virtual evidence, including multiple pieces of virtual evidence.Sun, 03 Sep 2017 00:00:00 +0000
http://proceedings.mlr.press/v73/chan17a.html
http://proceedings.mlr.press/v73/chan17a.htmlBackoff methods for estimating parameters of a Bayesian networkVarious authors have highlighted inadequacies of BDeu type scores and this problem is
shared in parameter estimation. Basically, Laplace estimates work poorly, at least because
setting the prior concentration is challenging. In 1997, Freidman et al suggested a simple
backoff approach for Bayesian network classifiers (BNCs). Backoff methods dominate in
in n-gram language models, with modified Kneser-Ney smoothing, being the best known,
and a Bayesian variant exists in the form of Pitman-Yor process language models from
Teh in 2006. In this talk we will present some results on using backoff methods for Bayes
network classifiers and Bayesian networks generally. For BNCs at least, the improvements
are dramatic and alleviate some of the issues of choosing too dense a network.
Sun, 03 Sep 2017 00:00:00 +0000
http://proceedings.mlr.press/v73/buntine17a.html
http://proceedings.mlr.press/v73/buntine17a.html