Proceedings of Machine Learning Research

A Toolbox for Context-Sensitive Grammar Induction by Genetic Search

Wed, 25 Aug 2021 00:00:00 +0000

This paper introduces a new tool for context-sensitive grammar inference. The source code and library are publicly available via GitHub and NuGet repositories. The article describes the implemented algorithm, input parameters, and the produced output grammar. In addition, the paper contains several use-cases. The described library is written in F#, hence it can be used in any .NET Framework language (F#, C#, C++/CLI, Visual Basic, and J#) and run under the control of varied operating systems.

Query Learning Algorithm for Symbolic Weighted Finite Automata

Wed, 25 Aug 2021 00:00:00 +0000

We propose a query learning algorithm for an extension of weighted finite automata (WFAs), named symbolic weighted finite automata (SWFAs), which can handle strings over infinite alphabets more efficiently. Based on the idea of symbolic finite automata, SWFAs generalize WFAs by allowing transitions to be functions from a possibly infinite alphabet to weights. Our algorithm can learn SWFAs if functions in transitions are also learnable by queries. We also investigate minimization and equivalence checking for SWFAs.

Investigating Backpropagation Alternatives when Learning to Dynamically Count with Recurrent Neural Networks

Wed, 25 Aug 2021 00:00:00 +0000

Artificial neural networks, such as recurrent neural networks (RNNs) and transformers, have empirically demonstrated impressive performance across many natural language processing tasks. However, automated text processing at a deeper and more interpretable level arguably requires extracting intricate patterns such as underlying grammatical structures. As a result, correctly interpreting a neural language model would require an understanding of linguistic structure through formal language theory. Nonetheless, there is often a discrepancy between theoretical and practical findings that restrict models informed by formal language theory in real-life scenarios. For instance, while learning context-free grammars (CFGs), existing neural models fall short because they lack appropriate memory structures. In this work, we investigate how learning algorithms affect the generalization ability of RNNs that are designed to learn context-free languages (CFGs) as well as their ability to encode hierarchical representations. To do so, we investigate a range of learning algorithms on complex, context-free languages such as the Dyck languages, with a focus on the RNN’s ability to generalize to longer sequences when processing a CFG. Our results demonstrate that a Long Short-term memory (LSTM) RNN equipped with second-order connections, trained with the sparse attentive backtracking (SAB) algorithm, performs stably across various Dyck languages and successfully emulates real-time-counter machines. We empirically show that RNNs without external memory are incapable of recognizing Dyck-2 languages, which require a stack-like structure. We finally investigate each learning algorithm’s performance on real-world language modeling tasks using the Penn Tree Bank and text8 benchmarks. We further investigate how an increase in model parameters affects each RNN’s stability and grammar recognition performance when trained using different learning algorithms.

Recognizing Long Grammatical Sequences using Recurrent Networks Augmented with an External Differentiable Stack

Wed, 25 Aug 2021 00:00:00 +0000

Recurrent neural networks (RNNs) are widely used for sequence modeling, generation, and prediction. Despite success in applications such as machine translation and voice recognition, these stateful models have several critical shortcomings. Specifically, RNNs struggle to recognize very long sequences, which limits their applicability in many important temporal processing and time series forecasting problems. For example, RNNs struggle in recognizing complex context free languages (CFLs), unable to reach 100% accuracy on the training set. One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack. However, differentiable memories in prior work have neither been extensively studied on complex CFLs nor tested on very long sequences that have been seen on training. In fact, earlier work has shown that continuous differentiable memory structures struggle in recognizing complex CFLs over very long sequences. In this paper, we improve the memory-augmented RNN with new architectural and state updating mechanisms that learn to properly balance the use of latent states with external memory. Our improved RNN models exhibit improved generalization and are able to classify long strings generated by complex hierarchical context free grammars (CFGs). We evaluate our models on CFGs, including the Dyck languages, as well as on the Penn Treebank language modelling dataset, achieving stable, robust performance on these benchmarks. Furthermore, we show that our proposed memory-augmented networks are able retain information over long sequences leading to improved generalization for strings up to length $160$.

Grammar Interpretations and Learning TSL Online

Wed, 25 Aug 2021 00:00:00 +0000

The tier-based strictly local ($\TSL{}$) languages are a class of formal languages that, alongside the strictly piecewise class, effectively model some long-distance generalizations in natural language (Heinz et al., 2011). Two learning algorithms for $\TSL{}$ already exist: one by Jardine and Heinz (2016) and one by Jardine and McMullin (2017). The former is limited in that it cannot learn all $\TSL{}$ patterns. The latter is restricted to a batch-learning environment. We present a general algorithm without these limitations. In particular we show that $\TSL{}$ is efficiently learnable online via reinterpretation of a strictly local grammar, and this mechanism generalizes to the strictly piecewise class as well. However we note that the known $\TSL{}$ learning algorithms are not robust in the face of interaction with other constraints, posing a challenge for the utility of this class for phonotactics.

Extracting Weighted Automata for Approximate Minimization in Language Modelling

Wed, 25 Aug 2021 00:00:00 +0000

In this paper we study the approximate minimization problem for language modelling. We assume we are given some language model as a black box. The objective is to obtain a weighted finite automaton (WFA) that fits within a given size constraint and which mimics the behaviour of the original model while minimizing some notion of distance between the black box and the extracted WFA. We provide an algorithm for the approximate minimization of black boxes trained for language modelling of sequential data over a one-letter alphabet. By reformulating the problem in terms of Hankel matrices, we leverage classical results on the approximation of Hankel operators, namely the celebrated Adamyan-Arov-Krein (AAK) theory. This allows us to use the spectral norm to measure the distance between the black box and the WFA. We provide theoretical guarantees to study the potentially infinite-rank Hankel matrix of the black box, without accessing the training data, and we prove that our method returns an asymptotically-optimal approximation.

A Hierarchy of Context-Free Languages Learnable from Positive Data and Membership Queries

Wed, 25 Aug 2021 00:00:00 +0000

We consider a generalization of the “dual” approach to distributional learning of context-free grammars, where each nonterminal $A$ is associated with a string set $X_A$ characterized by a finite set $C$ of contexts. Rather than letting $X_A$ be the set of all strings accepted by all contexts in $C$ as in previous works, we allow more flexible uses of the contexts in $C$, using some of them positively (contexts that accept the strings in $X_A$) and others negatively (contexts that do not accept any strings in $X_A$). The resulting more general algorithm works in essentially the same way as before, but on a larger class of context-free languages.

Inside-Outside Algorithm for Macro Grammars

Wed, 25 Aug 2021 00:00:00 +0000

We propose an inside-outside algorithm for stochastic macro grammars. Our approach is based on types, which has been inspired by type-based approaches to reasoning about functional programs and higher-order grammars. By considering type derivations instead of ordinary word derivation sequences, we can naturally extend the standard inside-outside algorithm for stochastic context-free grammars to obtain the algorithm for stochastic macro grammars. We have implemented the algorithm and confirmed its effectiveness through an application to the learning of macro grammars.

Learning Input Strictly Local Functions From Their Composition

Wed, 25 Aug 2021 00:00:00 +0000

This paper studies the learning of two functions given positive samples of their composition, motivated by an empirical problem in natural language phonology. Empirically relevant conditions under which this is possible are identified and a provably correct algorithm is given that can semi-strongly identify the two functions in polynomial time and data. In order to clearly illustrate the learning problem and related concepts, we focus on a simple subset of input strictly 2-local functions. But we further argue that the general learning procedure we propose can be extended to more general classes of functions.

Learning DFAs by Evolving Short Sequences of Merges

Wed, 25 Aug 2021 00:00:00 +0000

The grammatical inference community has been studying evolutionary methods for DFA learning for almost three decades. These methods typically operate by learning a representation of the target DFA either as a partitioning the states of a prefix tree acceptor or as an encoding of its transition matrix. In this paper, we present an alternative approach for learning random DFAs over binary alphabets from sparse training data. We first conducted several experiments on thousands of problem instances to study their behaviour and to better understand the conditions under which state merging algorithms succeed or fail. Motivated by these observations, we implemented an evolutionary algorithm in which the chromosomes encode short sequences of merges selected from a subset of high state-reduction merges. The fitness of a chromosome is then measured by extending it using the EDSM heuristic and the size of the final hypothesis is used to score the entire sequence. To improve runtime performance, we use a method that can reliably estimate the fitness of a sequence of merges without extending it completely. We use the state-of-the-art EDSM algorithm as a baseline to compare our results to and observe that we can find low-error hypotheses or the exact target DFAs with a considerably higher likelihood.

The complexity of learning linear temporal formulas from examples

Wed, 25 Aug 2021 00:00:00 +0000

In this paper we initiate the study of the computational complexity of learning linear temporal logic (LTL) formulas from examples. We construct approximation algorithms for fragments of LTL and prove hardness results; in particular we obtain tight bounds for the fragment containing only the next operator and conjunctions, and prove NP-hardness results for many fragments.

Using Grammatical Inference to Build Privacy Preserving Data-sets of User Logs

Wed, 25 Aug 2021 00:00:00 +0000

In many web applications, user logs are extracted to build a user model which can be part of further development, recommendation systems or personalization. This is the case for education platforms like X5GON. In order to obtain community collaboration, these logs should be shared, but logical privacy issues arise. In this work, we propose to build a user model from a data-set of logs: this will be a timed and probabilistic $k$-testable automaton, which can then be used to generate a new data-set having statistically close characteristics, yet have in which the original sequences have been sufficiently chunked the original data to not be able to identify the original logs. Following ideas from Differencial Privacy, we provide a second algorithm allowing to eliminate any strings whose influence would be too great. Experiments validate the approach.

Beyond Chomsky Normal Form: Extending Strong Learning Algorithms for PCFGs

Wed, 25 Aug 2021 00:00:00 +0000

We extend a recent consistent strong learning algorithm for a subclass of probabilistic context-free grammars in Chomsky normal form, (Clark and Fijalkow, 2020) to a much larger class of grammars by weakening the normal form constraints to allow for a richer class of derivation structures that are not necessarily binary branching, including among other possibilities unary branching trees which introduce some technical difficulties. By modifying the structural conditions appropriately we obtain an algorithm which is computationally efficient, and a consistent estimator for the class of grammars defined.

Preface

Wed, 25 Aug 2021 00:00:00 +0000

Preface to the proceedings.

Learning Multiple Independent Tier-based Processes

Wed, 25 Aug 2021 00:00:00 +0000

Work on the learnability of tier-based string-to-string functions has so far been limited to those that operate over a single tier. Such functions are, however, generally incapable of modelling multiple simultaneous processes. It is relatively easy to define a class of multi-tiered functions that can handle more than one process at a time, but doing so has negative consequences for learnability. Namely, we conjecture that it is difficult if not impossible to efficiently learn any arbitrary set of tiers from positive data alone. We thus describe the strongly target-specified subclass of multi-tiered functions, whose tiers can be efficiently identified from positive data. In these functions, each input element is associated with a single tier that on its own can fully determine what the element is mapped to. When the tiers act independently in this way, we can learn them in isolation from each other. A transducer representation of the target function can then be constructed using the discovered tiers.

Extracting Context-Free Grammars from Recurrent Neural Networks using Tree-Automata Learning and A* Search

Wed, 25 Aug 2021 00:00:00 +0000

This paper presents (i) an active learning algorithm for visibly pushdown grammars and (ii) shows its applicability for learning surrogate models of recurrent neural networks (RNNs) trained on context-free languages. Such surrogate models may be used for verification or explainability. Our learning algorithm makes use of the proximity of visibly pushdown languages and regular tree languages and builds on an existing learning algorithm for regular tree languages. Equivalence tests between a given RNN and a hypothesis grammar rely on a mixture of A* search and random sampling. An evaluation of our approach on a set of RNNs from the literature shows good preliminary results.