Proceedings of Machine Learning Research

Proceedings of Machine Learning Research The 12th International Conference on Grammatical Inference Held in Kyoto, Japan on 17-19 September 2014 Published as Volume 34 by the Proceedings of Machine Learning Research on 30 August 2014. Volume Edited by: Alexander Clark Makoto Kanazawa Ryo Yoshinaka Series Editors: Neil D. Lawrence Mark Reid https://proceedings.mlr.press/v34/ Wed, 08 Feb 2023 10:39:58 +0000 Wed, 08 Feb 2023 10:39:58 +0000 Jekyll v3.9.3 Induction of Directed Acyclic Word Graph in a Bioinformatics Task In this paper a new algorithm for the induction of a Directed Acyclic Word Graph (DAWG) is proposed. A DAWG can serve as a very efficient data structure for lexicon representation and fast string matching, and have a variety of applications. Similar structures are being investigated in the theory of formal languages and grammatical inference, namely deterministic and nondeterministic finite automata (DFA and NFA, respectively). Since a DAWG is acyclic the proposed method is suited for problems where the target language does not necessarily have to be infinite. The experiments have been performed for a dataset from the domain of bioinformatics, and our results are compared with those obtained using the current state-of-the-art methods in heuristic DFA induction. Sat, 30 Aug 2014 00:00:00 +0000 https://proceedings.mlr.press/v34/wieczorek14a.html https://proceedings.mlr.press/v34/wieczorek14a.html Evaluation of selection in context-free grammar learning systems Grammatical inference deals with learning of grammars describing languages. Formal grammatical inference aims at identifying families of languages that have a shared property, which can be used to prove efficient learnability of the families formally. In contrast, in empirical grammatical inference research, practical systems are developed that are applied to languages. The effectiveness of these systems is measured by comparing the learned grammar against a Gold standard which indicates the ground truth. From successful empirical learnability results, either shared properties may be identified, leading to further formal learnability results, or modifications to the systems may be made, improving practical results. Proper evaluation of empirical systems is, therefore, essential. Here, we evaluate and compare existing state-of-the-art context-free grammar learning systems (and novel systems based on combinations of existing phases) in a standardized evaluation environment (on a corpus of plain natural language sentences), illustrating future directions for empirical grammatical inference research. Sat, 30 Aug 2014 00:00:00 +0000 https://proceedings.mlr.press/v34/vanzaanen14a.html https://proceedings.mlr.press/v34/vanzaanen14a.html An example distribution for probabilistic query learning of simple deterministic languages In this paper, we show a special example distribution on which the learner can guess a correct simple deterministic grammar in polynomial time from membership queries and random examples. At first, we show a learning algorithm of simple deterministic languages from membership and equivalence queries. This algorithm is not a polynomial time algorithm but, assuming a special example distribution, we can modify it to the polynomial time probabilistic learning algorithm. Sat, 30 Aug 2014 00:00:00 +0000 https://proceedings.mlr.press/v34/tajima14a.html https://proceedings.mlr.press/v34/tajima14a.html Towards a rationalist theory of language acquisition Recent computational, mathematical work on learnability extends to classes of languages that plausibly include the human languages, but there is nevertheless a gulf between this work and linguistic theory. The languages of the two fields seem almost completely disjoint and incommensurable. This paper shows that this has happened, at least in part, because the recent advances in learnability have been misdescribed in two important respects. First, they have been described as resting on ‘empiricist’ conceptions of language, when actually, in fundamental respects that are made precise here, they are equally compatible with the ‘rationalist’, ‘nativist’ traditions in linguistic theory. Second, the recent mathematical proposals have sometimes been presented as if they not only advance but complete the account of human language acquisition, taking the rather dramatic difference between what current mathematical models can achieve and what current linguistic theories tell us as an indication that current linguistic theories are quite generally mistaken. This paper compares the two perspectives and takes some first steps toward a unified theory, aiming to identify some common ground where ‘rationalist’ linguistic hypotheses could directly address weaknesses in the current mathematical proposals. Sat, 30 Aug 2014 00:00:00 +0000 https://proceedings.mlr.press/v34/stabler14a.html https://proceedings.mlr.press/v34/stabler14a.html Bigger is Not Always Better: on the Quality of Hypotheses in Active Automata Learning In Angluin’s L^∗ algorithm a learner constructs a sequence of hypotheses in order to learn a regular language. Each hypothesis is consistent with a larger set of observations and is described by a bigger model. From a behavioral perspective, however, a hypothesis is not always better than the previous one, in the sense that the minimal length of a counterexample that distinguishes a hypothesis from the target language may decrease. We present a simple modification of the L^∗ algorithm that ensures that for subsequent hypotheses the minimal length of a counterexample never decreases, which implies that the distance to the target language never increases in a corresponding ultrametric. Preliminary experimental evidence suggests that our algorithm speeds up learning in practical applications by reducing the number of equivalence queries. Sat, 30 Aug 2014 00:00:00 +0000 https://proceedings.mlr.press/v34/smetsers14a.html https://proceedings.mlr.press/v34/smetsers14a.html Inferring (k,l)-context-sensitive probabilistic context-free grammars using hierarchical Pitman-Yor processes Motivated by the idea of applying nonparametric Bayesian models to dual approaches for distributional learning, we define (k,l)-context-sensitive probabilistic context-free grammars (PCFGs) using hierarchical Pitman-Yor processes (PYPs). The data sparseness problem that occurs when inferring context-sensitive probabilities for rules is handled by the smoothing effect of hierarchical PYPs. Many possible definitions or constructions of PYP hierarchies can be used to represent the context sensitivity of derivations of CFGs in Chomsky normal form. In this study, we use a definition that is considered to be the most natural as an extension of infinite PCFGs defined in previous studies. A Markov Chain Monte Carlo method called blocked Metropolis-Hastings (MH) sampling is known to be effective for inferring PCFGs from unsupervised sentences. Blocked MH sampling is applicable to (k,l)-context-sensitive PCFGs by modifying their so-called inside probabilities. We show that the computational cost of blocked MH sampling for (k,l)-context-sensitive PCFGs is O(|V|^l+3|s|^3) for each sentence s, where V is a set of nonterminals. This cost is too high to iterate sufficient sampling times, especially when l ≠0, thus we propose an alternative sampling method that separates the sampling procedure into pointwise sampling for nonterminals and blocked sampling for rules. The computational cost of this sampling method is O(\min{|s|^l,|V|^l} (|V||s|^2+|s|^3) ). Sat, 30 Aug 2014 00:00:00 +0000 https://proceedings.mlr.press/v34/shibata14a.html https://proceedings.mlr.press/v34/shibata14a.html Grammatical Inference of some Probabilistic Context-Free Grammars from Positive Data using Minimum Satisfiability Recently, different theoretical learning results have been found for a variety of context-free grammar subclasses through the use of distributional learning (Clark, 2010b). However, these results are still not extended to probabilistic grammars. In this work, we give a practical algorithm, with some proven properties, that learns a subclass of probabilistic grammars from positive data. A minimum satisfiability solver is used to direct the search towards small grammars. Experiments on typical context-free languages and artificial natural language grammars give positive results. Sat, 30 Aug 2014 00:00:00 +0000 https://proceedings.mlr.press/v34/scicluna14a.html https://proceedings.mlr.press/v34/scicluna14a.html Grammar Compression: Grammatical Inference by Compression and Its Application to Real Data A grammatical inference algorithm tries to find as a small grammar as possible representing a potentially infinite sequence of strings. Here, let us consider a simple restriction: the input is a finite sequence or it might be a singleton set. Then the restricted problem is called the \em grammar compression to find the smallest CFG generating just the input. In the last decade many researchers have tackled this problem because of its scalable applications, e.g., expansion of data storage capacity, speeding-up information retrieval, DNA sequencing, frequent pattern mining, and similarity search. We would review the history of grammar compression and its wide applications together with an important future work. The study of grammar compression has begun with the bad news: the smallest CFG problem is NP-hard. Hence, the first question is: Can we get a near-optimal solution in a polynomial time? (Is there a reasonable approximation algorithm?) And the next question is: Can we minimize the costs of time and space? (Does a linear time algorithm exist within an optimal working space?) The recent results produced by the research community answer affirmatively the questions. We introduce several important results and typical applications to a huge text collection. On the other hand, the shrinkage of the advantage of grammar compression is caused by the data explosion, since there is no working space for storing the whole data supplied from data stream. The last question is: How can we handle the stream data? For this question, we propose the framework of \em stream grammar compression for the next generation and its attractive application to fast data transmission. Sat, 30 Aug 2014 00:00:00 +0000 https://proceedings.mlr.press/v34/sakamoto14a.html https://proceedings.mlr.press/v34/sakamoto14a.html Maximizing a Tree Series in the Representation Space This paper investigates the use of linear representations of trees (i.e. mappings from the set of trees into a finite dimensional vector space which are induced by rational series on trees) in the context of structured data learning. We argue that this representation space can be more appealing than the space of trees to handle machine learning problems involving trees. Focusing on a tree series maximization problem, we first analyze its complexity to motivate the use of approximation techniques. We then show how a tree series can be extended to the continuous representation space, we propose an adaptive Metropolis-Hastings algorithm to solve the maximization problem in this space, and we establish convergence guarantees. Finally, we provide some experiments comparing our algorithm with an implementation of the Metropolis-Hastings algorithm in the space of trees. Sat, 30 Aug 2014 00:00:00 +0000 https://proceedings.mlr.press/v34/rabusseau14a.html https://proceedings.mlr.press/v34/rabusseau14a.html Learning Nondeterministic Mealy Machines In applications where abstract models of reactive systems are to be inferred, one important challenge is that the behavior of such systems can be inherently nondeterministic. To cope with this challenge, we developed an algorithm to infer nondeterministic computation models in the form of Mealy machines. We introduce our approach and provide extensive experimental results to assess its potential in the identification of black-box reactive systems. The experiments involve both artificially-generated abstract Mealy machines, and the identification of a TFTP server model starting from a publicly-available implementation. Sat, 30 Aug 2014 00:00:00 +0000 https://proceedings.mlr.press/v34/khalili14a.html https://proceedings.mlr.press/v34/khalili14a.html Very efficient learning of structured classes of subsequential functions from positive data In this paper, we present a new algorithm that can identify in polynomial time and data using positive examples any class of subsequential functions that share a particular finite-state structure. While this structure is given to the learner \textita priori, it allows for the exact learning of partial functions, and both the time and data complexity of the algorithm are linear. We demonstrate the algorithm on examples from natural language phonology and morphology in which the needed structure has been argued to be plausibly known in advance. A procedure for making any subsequential transducer onward without changing its structure is also presented. Sat, 30 Aug 2014 00:00:00 +0000 https://proceedings.mlr.press/v34/jardine14a.html https://proceedings.mlr.press/v34/jardine14a.html An Abstract Framework for Counterexample Analysis in Active Automata Learning Counterexample analysis has emerged as one of the key challenges in Angluin-style active automata learning. Rivest and Schapire (1993) showed for the \mathrmL^* algorithm that a single suffix of the counterexample was sufficient to ensure progress. This suffix can be obtained in a binary search fashion, requiring Θ(\log m) membership queries for a counterexample of length m. Correctly implementing this algorithm can be quite tricky, and its correctness sometimes even has been disputed. In this paper, we establish an abstract framework for counterexample analysis, which basically reduces the problem of finding a suffix to finding distinct neighboring elements in a 0/1 sequence, where the first element is 0 and the last element is 1. We demonstrate the conciseness and simplicity of our framework by using it to present new counterexample analysis algorithms, which, while maintaining the worst-case complexity of O(\log m), perform significantly better in practice. Furthermore, we contribute—in a second instantiation of our framework, highlighting its generality—the first sublinear counterexample analysis procedures for the algorithm due to Kearns and Vazirani (1994). Sat, 30 Aug 2014 00:00:00 +0000 https://proceedings.mlr.press/v34/isberner14a.html https://proceedings.mlr.press/v34/isberner14a.html Some improvements of the spectral learning approach for probabilistic grammatical inference Spectral methods propose new and elegant solutions in probabilistic grammatical inference. We propose two ways to improve them. We show how a linear representation, or equivalently a weighted automata, output by the spectral learning algorithm can be taken as an initial point for the Baum Welch algorithm, in order to increase the likelihood of the observation data. Secondly, we show how the inference problem can naturally be expressed in the framework of Structured Low-Rank Approximation. Both ideas are tested on a benchmark extracted from the PAutomaC challenge. Sat, 30 Aug 2014 00:00:00 +0000 https://proceedings.mlr.press/v34/gybels14a.html https://proceedings.mlr.press/v34/gybels14a.html A bottom-up efficient algorithm learning substitutable languages from positive examples Based on Harris’s substitutability criterion, the recent definitions of classes of substitutable languages have led to interesting polynomial learnability results for expressive formal languages. These classes are also promising for practical applications: in natural language analysis, because definitions have strong linguisitic support, but also in biology for modeling protein families, as suggested in our previous study introducing the class of local substitutable languages. But turning recent theoretical advances into practice badly needs truly practical algorithms. We present here an efficient learning algorithm, motivated by intelligibility and parsing efficiency of the result, which directly reduces the positive sample into a small non-redundant canonical grammar of the target substitutable language. Thanks to this new algorithm, we have been able to extend our experimentation to a complete protein dataset confirming that it is possible to learn grammars on proteins with high specificity and good sensitivity by a generalization based on local substitutability. Sat, 30 Aug 2014 00:00:00 +0000 https://proceedings.mlr.press/v34/coste14a.html https://proceedings.mlr.press/v34/coste14a.html Preface Preface for the 12th International Conference on Grammatical Inference. Sat, 30 Aug 2014 00:00:00 +0000 https://proceedings.mlr.press/v34/clark14a.html https://proceedings.mlr.press/v34/clark14a.html A Canonical Semi-Deterministic Transducer We prove the existence of a canonical form for semi-deterministic transducers with sets of pairwise incomparable output strings. Based on this, we develop an algorithm which learns semi-deterministic transducers given access to translation queries. We also prove that there is no learning algorithm for semi-deterministic transducers that uses only domain knowledge. Sat, 30 Aug 2014 00:00:00 +0000 https://proceedings.mlr.press/v34/beros14a.html https://proceedings.mlr.press/v34/beros14a.html