Proceedings of Machine Learning ResearchProceedings of 16th edition of the International Conference on Grammatical Inference
Held in Rabat, Morocco on 10-13 July 2023
Published as Volume 217 by the Proceedings of Machine Learning Research on 05 July 2023.
Volume Edited by:
François Coste
Faissal Ouardi
Guillaume Rabusseau
Series Editors:
Neil D. Lawrence
https://proceedings.mlr.press/v217/
Wed, 05 Jul 2023 15:20:06 +0000Wed, 05 Jul 2023 15:20:06 +0000Jekyll v3.9.3String Extension Learning Despite Noisy IntrusionsWe examine the conditions in which string extension learning algorithms are able to identify classes of formal languages in the limit from noisy data presentations in polynomial time. A data presentation for a formal language $L$ is noisy if it contains words belonging to the complement of $L$. In the general case, string extensions learners cannot distinguish noise from true examples and are led astray. The main result is that relative frequencies can be used to distinguish noisy examples from true examples provided the data presentations are constrained to those in which relative frequencies are uniformly present and exceed the rate at which noise is introduced.Wed, 05 Jul 2023 00:00:00 +0000
https://proceedings.mlr.press/v217/wu23a.html
https://proceedings.mlr.press/v217/wu23a.htmlLearning Transductions and Alignments with RNN Seq2seq ModelsThe paper studies the capabilities of Recurrent-Neural-Network sequence to sequence (RNN seq2seq) models in learning four transduction tasks: identity, reversal, total reduplication, and quadratic copying. These transductions are traditionally well studied under finite state transducers and attributed with increasing complexity. We find that RNN seq2seq models are only able to approximate a mapping that fits the training or in-distribution data, instead of learning the underlying functions. Although attention makes learning more efficient and robust, it does not overcome the out-of-distribution generalization limitation. We establish a novel complexity hierarchy for learning the four tasks for attention-less RNN seq2seq models, which may be understood in terms of the complexity hierarchy of formal languages, instead of string transductions. RNN variants also play a role in the results. In particular, we show that Simple RNN seq2seq models cannot count the input length. \\Wed, 05 Jul 2023 00:00:00 +0000
https://proceedings.mlr.press/v217/wang23a.html
https://proceedings.mlr.press/v217/wang23a.htmlDetecting Changes in Loop Behavior for Active LearningActive automaton learning is a popular approach for building models of software systems. The approach forms a hypothesis model from observations and then performs a heuristic equivalence query to check if the learned model is equal to the model under test. The current methods for equivalence queries, however often fail to find counterexamples when encountering loops, one of the most common control structures in software. We introduce two novel equivalence checkers that better handle loops. One extends the well-known W-Method, and the other uses symbolic execution. Both methods are tested on RERS challenge problems. We show that our approaches find more counterexamples on suitable problems and thus learn more accurate models. We further test our symbolic execution approach outside active learning and show that it finds more errors than the state-of-the-art method Klee on several problems.Wed, 05 Jul 2023 00:00:00 +0000
https://proceedings.mlr.press/v217/verboom23a.html
https://proceedings.mlr.press/v217/verboom23a.htmlBenchmarking State-Merging Algorithms for Learning Regular LanguagesThe state-merging algorithms RPNI, EDSM, and ALERGIA are tested on MLRegTest, a benchmark for the learning of regular languages citep{heinz-etal-2022-mlregtest}. MLRegTest contains training, development, and test data for 1,800 regular languages, which themselves are from several well-studied subregular classes. The results show that there is large variation in the performance of these algorithms on the benchmark with EDSM performing the best overall. Furthermore, the mean accuracies on the test data for all three state-merging algorithms are less than the mean accuracies obtained by the neural networks citet{heinz-etal-2022-mlregtest} studied. A further experiment augments the training data in MLRegtest with shorter strings and shows they dramatically improve the performance of the state-merging algorithms.Wed, 05 Jul 2023 00:00:00 +0000
https://proceedings.mlr.press/v217/soubki23a.html
https://proceedings.mlr.press/v217/soubki23a.htmlfAST: regular expression inference from positive examples using Abstract Syntax TreesOur paper presents a new algorithm that infers a regular expression matching a given set of strings known as \emph{positive examples}\footnote{Positive (resp. negative) examples are strings that do (resp. do not) belong to the target language.}. This algorithm has practical applications in automating file parsing for files with an unknown template. In practice, prior works hardly apply because they require negative examples and hardly scale. By restricting to positive examples, the problem becomes especially challenging: many regular expressions can match the set of positive examples, but only a few are useful in practice. To assess the quality of a regular expression, we introduce two performance metrics, called accuracy and conciseness. The contributions of the paper are threefold. First, we introduce an algorithm that infers a regular expression from positive examples only while optimizing accuracy and conciseness. Second, we adapt this algorithm to generate a regular expression based on a set of predefined patterns. Third, we demonstrate the tractability and the usefulness of our solution by performing experiments on synthesized and real-world datasets.Wed, 05 Jul 2023 00:00:00 +0000
https://proceedings.mlr.press/v217/raynal23a.html
https://proceedings.mlr.press/v217/raynal23a.htmlIdentification of Substitutable Context-Free Languages over Infinite Alphabets from Positive DataThis paper is concerned with the identification in the limit from positive data of substitutable context-free languages \textsc{cfl}s) over infinite alphabets. citet{ClarkE07} showed that substitutable \textsc{cfl}s over finite alphabets are learnable in this learning paradigm. We show that substitutable \textsc{cfl}s generated by grammars whose production rules may have \emph{predicates} that represent sets of potentially infinitely many terminal symbols in a compact manner are learnable if the terminal symbol sets represented by those predicates are learnable, under a certain condition. This can be seen as a result parallel to citeauthor{ArgyrosDA2018}’s work (2018) that amplifies the query learnability of predicate classes to that of symbolic automata classes. Our result is the first that shows such amplification is possible for identifying some \textsc{cfl}s in the limit from positive data.Wed, 05 Jul 2023 00:00:00 +0000
https://proceedings.mlr.press/v217/numaya23a.html
https://proceedings.mlr.press/v217/numaya23a.htmlTesting-based Black-box Extraction of Simple Models from RNNs and TransformersIn this technical report, we outline the testing-based black-box method used to extract simple and interpretable models from RNNs and transformers. Our work was done in the scope of the TAYSIR competition, in which it won the first place.Wed, 05 Jul 2023 00:00:00 +0000
https://proceedings.mlr.press/v217/muskardin23a.html
https://proceedings.mlr.press/v217/muskardin23a.htmlFormal languages and neural models for learning on sequencesThe empirical success of deep learning in NLP and related fields motivates understanding the model of grammar implicit within neural networks on a theoretical level. In this tutorial, I will overview recent empirical and theoretical insights on the power of neural networks as formal language recognizers. We will cover the classical proof that infinite-precision RNNs are Turing-complete, formal analysis and experiments comparing the relative power of different finite-precision RNN architectures, and recent work characterizing transformers as language recognizers using circuits and logic. We may also cover applications of this work, including the extraction of discrete models from neural networks. Hopefully, the tutorial will synthesize different analysis frameworks and findings about neural networks into a coherent narrative, and provide a call to action for the ICGI community to engage with exciting open questions.Wed, 05 Jul 2023 00:00:00 +0000
https://proceedings.mlr.press/v217/merrill23a.html
https://proceedings.mlr.press/v217/merrill23a.htmlResults of Neural-Checker Toolbox in Taysir 2023 CompetitionThis paper presents the results obtained with the {\sf Neural-Checker} toolbox in the Taysir 2023 challenge. It briefly describes the two tracks of the competition and the specific techniques that yielded the best results with respect to the corresponding scoring metrics.Wed, 05 Jul 2023 00:00:00 +0000
https://proceedings.mlr.press/v217/mayr23b.html
https://proceedings.mlr.press/v217/mayr23b.htmlA Congruence-based Approach to Active Automata Learning from Neural Language ModelsThe paper proposes an approach for probably approximately correct active learning of probabilistic automata (PDFA) from neural language models. It is based on a congruence over strings which is parameterized by an equivalence relation over probability distributions. The learning algorithm is implemented using a tree data structure of arbitrary (possibly unbounded) degree. The implementation is evaluated with several equivalences on LSTM and Transformer-based neural language models from different application domains.Wed, 05 Jul 2023 00:00:00 +0000
https://proceedings.mlr.press/v217/mayr23a.html
https://proceedings.mlr.press/v217/mayr23a.htmlEmpirical and Theoretical Arguments for Using Properties of Letters for the Learning of Sequential FunctionsWed, 05 Jul 2023 00:00:00 +0000
https://proceedings.mlr.press/v217/markowska23a.html
https://proceedings.mlr.press/v217/markowska23a.htmlLower Bounds for Active Automata LearningWe study lower bounds for the number of output and equivalence queries required for active learning of finite state machines, with a focus on $L^{#}$, a new learning algorithm that requires fewer queries for learning than other state-of-the-art algorithms on a large collection of benchmarks. We improve the lower bound of cite{BalcazarDG97} on the combined number of output and equivalence queries required by any learning algorithm, and give a simpler proof. We prove that in the worst case $L^{#}$ needs $n-1$ equivalence queries to learn an FSM with $n$ states, and establish lower bounds on the number of output queries needed by $L^{#}$ in the worst case. In practical applications, the maximum length of the shortest separating sequence for all pairs of inequivalent states (MS3) is often just $1$ or $2$. We present $L^{#}_h$, a version of $L^{#}$ with bounded lookahead $h$, which learns FSMs with an MS3 of at most $h$ without requiring any equivalence queries, and give lower and upper bounds on its complexity.Wed, 05 Jul 2023 00:00:00 +0000
https://proceedings.mlr.press/v217/kruger23a.html
https://proceedings.mlr.press/v217/kruger23a.htmlExtending Distributional Learning from Positive Data and Membership QueriesWe consider an extension of distributional learning of context-free languages (from positive data and membership queries), where nonterminals are represented by extended regular expressions (allowing all Boolean operations) augmented by atoms corresponding to membership queries. These nonterminals classify a string based not just on its distribution, but also on the distributions of its substrings. The learning algorithm for this extension works in essentially the same way as in previous works on distributional learning, while targeting a significantly larger class of context-free languages.Wed, 05 Jul 2023 00:00:00 +0000
https://proceedings.mlr.press/v217/kanazawa23a.html
https://proceedings.mlr.press/v217/kanazawa23a.htmlA Procedure for Inferring a Minimalist Lexicon from an SMT Model of a Language Acquisition Device We introduce a constraint-based procedure for inferring a Minimalist Grammar (MG) that falls within the “Logic Grammar” framework. The procedure, implemented as a working computer program, takes as input an MG lexicon and a sequence of sentences paired with their semantic representation, and outputs an MG lexicon that is a superset of the input lexicon and that yields for each input sentence a syntatic structure encoding the associated semantic representation. The procedure operates by first constructing an SMT model of a language acquisition device that is constrained by the input lexicon and the (sentence, semantic-representation) pairs, and then using an SMT-solver to identify a model-solution in which the lexicon is optimized for parsimony. We show how the procedure can be used to form a computational model of a child language learner, presenting two experiments in which the procedure is used for instantaneous and incremental acquisition of an MG lexicon, and find that the optimal MG lexicons inferred by the procedure yield derivations that agree with the prescriptions of contemporary theories of minimalist syntax.Wed, 05 Jul 2023 00:00:00 +0000
https://proceedings.mlr.press/v217/indurkhya23a.html
https://proceedings.mlr.press/v217/indurkhya23a.htmlActive Inference of Extended Finite State Models of Software SystemsExtended finite state machines (EFSMs) model stateful systems with internal data variables, and have many software engineering applications. It is possible to infer such models by observing system behaviour. Still, existing approaches are either limited to classical FSM models with no internal data state, or implicitly require the ability to reset the system under inference, which may not always be possible. We present an extension to the hW-inference algorithm that can infer EFSM models, with input and output parameters as well as guards and internal registers and their data update functions, from systems without a reliable reset. For the problem to be tractable, we require some assumptions on the observability and determinism of the system. The main restriction is that the control flow of the system must be finite, although data types could be infinite.Wed, 05 Jul 2023 00:00:00 +0000
https://proceedings.mlr.press/v217/groz23a.html
https://proceedings.mlr.press/v217/groz23a.htmlTAYSIR Competition: Transformer+\textscrnn: Algorithms to Yield Simple and Interpretable RepresentationsThis article presents the content of the competition Transformers+\textsc{rnn}: Algorithms to Yield Simple and Interpretable Representations (TAYSIR, the Arabic word for ‘simple’), which was an on-line challenge on extracting simpler models from already trained neural networks held in Spring 2023. These neural nets were trained on sequential categorial/symbolic data. Some of these data were artificial, some came from real world problems (such as Natural Language Processing, Bioinformatics, and Software Engineering). The trained models covered a large spectrum of architectures, from Simple Recurrent Neural Network (SRN) to Transformers, including Gated Recurrent Unit (GRU) and Long Short Term Memory (LSTM). No constraint was given on the surrogate models submitted by the participants: any model working on sequential data was accepted. Two tracks were proposed: neural networks trained on Binary Classification tasks, and on Language Modeling tasks. The evaluation of the surrogate models took into account both the simplicity of the extracted model and the quality of the approximation of the original model.Wed, 05 Jul 2023 00:00:00 +0000
https://proceedings.mlr.press/v217/eyraud23a.html
https://proceedings.mlr.press/v217/eyraud23a.htmlA journey into the Generative AI and large language models: From NLP to BioInformaticsIn the last year, the generative AI field has seen a remarkable breakthrough, specifically the generative ai models and their applications in the natural language processing domain. It has achieved new state-of-the-art results on all public datasets and super human-level chatting capabilities. The backbone of this breakthrough is the large language models, including OpenAI GPT and Google Palm. The advantages of these large language models are that they can effectively capture the semantic, syntactic, grammar, and meaning of characters, words, and sentences from large unlabelled datasets using self-supervised learning. Later it can be used to represent sentences and documents better through embedding or as a zero/multi-shot learning method for many NLP tasks. Fortunately, these models have started to be leveraged in other fields like bioinformatics and biochemistry. This talk will give an overview of the large language models and how it was applied in the Bioinformatics field to boost the performance on many use cases. Furthermore, it will show how high-performance computing and optimized deep-learning software and libraries allowed these models to be faster and more efficient during training and inference.Wed, 05 Jul 2023 00:00:00 +0000
https://proceedings.mlr.press/v217/elnaggar23a.html
https://proceedings.mlr.press/v217/elnaggar23a.htmlFormal and Empirical Studies of Counting Behaviour in ReLU RNNsIn recent years, the discussion about systematicity of neural network learning has gained renewed interest, in particular the formal analysis of neural network behaviour. In this paper, we investigate the capability of single-cell ReLU RNN models to demonstrate precise counting behaviour. Formally, we start by characterising the semi-Dyck-1 language and semi-Dyck-1 counter machine that can be implemented by a single Rectified Linear Unit (ReLU) cell. We define three Counter Indicator Conditions (CICs) on the weights of a ReLU cell and show that fulfilling these conditions is equivalent to accepting the semi-Dyck-1 language, i.e. to perform exact counting. Empirically, we study the ability of single-cell ReLU RNNs to learn to count by training and testing them on different datasets of Dyck-1 and semi-Dyck-1 strings. While networks that satisfy the CICs count exactly and thus correctly even on very long strings, the trained networks exhibit a wide range of results and never satisfy the CICs exactly. We investigate the effect of deviating from the CICs and find that configurations that fulfil the CICs are not at a minimum of the loss function in the most common setups. This is consistent with observations in previous research indicating that training ReLU networks for counting tasks often leads to poor results. We finally discuss implications of these results and possible avenues for improving network behaviour.Wed, 05 Jul 2023 00:00:00 +0000
https://proceedings.mlr.press/v217/el-naggar23a.html
https://proceedings.mlr.press/v217/el-naggar23a.htmlLearning Syntactic Monoids from Samples by extending known Algorithms for learning State MachinesFor the inference of regular languages, most current methods learn a version of deterministic finite automata. Syntactic monoids are an alternative representation of regular languages, which have some advantages over automata. For example, traces can be parsed starting from any index and the star-freeness of the language they represent can be checked in polynomial time. But, to date, there existed no passive learning algorithm for syntactic monoids. In this paper, we prove that known state-merging algorithms for learning deterministic finite automata can be instrumented to learn syntactic monoids instead, by using as the input a special structure proposed in this paper: the interfix-graph. Further, we introduce a method to encode frequencies on the interfix-graph, such that models can also be learned from only positive traces. We implemented this structure and performed experiments with both traditional data and data containing only positive traces. As such this work answers basic theoretical and experimental questions regarding a novel passive learning algorithm for syntactic monoids.Wed, 05 Jul 2023 00:00:00 +0000
https://proceedings.mlr.press/v217/dieck23a.html
https://proceedings.mlr.press/v217/dieck23a.htmlPrefaceWed, 05 Jul 2023 00:00:00 +0000
https://proceedings.mlr.press/v217/coste23a.html
https://proceedings.mlr.press/v217/coste23a.htmlLearning state machines from data streams: A generic strategy and an improved heuristicState machines models are models that simulate the behavior of discrete event systems, capable of representing systems such as software systems, network interactions, and control systems, and have been researched extensively. The nature of most learning algorithms however is the assumption that all data be available at the begining of the algorithm, and little research has been done in learning state machines from streaming data. In this paper, we want to close this gap further by presenting a generic method for learning state machines from data streams, as well as a merge heuristic that uses sketches to account for incomplete prefix trees. We implement our approach in an open-source state merging library and compare it with existing methods. We show the effectiveness of our approach with respect to run-time, memory consumption, and quality of results on a well known open dataset.Wed, 05 Jul 2023 00:00:00 +0000
https://proceedings.mlr.press/v217/baumgartner23a.html
https://proceedings.mlr.press/v217/baumgartner23a.htmlLearning of Regular Languages by Recurrent Neural Networks? (Mainly Questions)Recurrent neural network architectures were introduced over 30 years ago. From the start attention focused on their performance at learning regular languages using some variant of gradient descent. This talk reviews some of the history of that research, includes some empirical observations, and emphasizes questions to which we still seek answers.Wed, 05 Jul 2023 00:00:00 +0000
https://proceedings.mlr.press/v217/angluin23a.html
https://proceedings.mlr.press/v217/angluin23a.htmlWeighted Finite Automata with Failure Transitions: Algorithms and ApplicationsWeighted finite automata (WFA) are used in many applications including speech recognition, speech synthesis, machine translation, computational biology, image processing, and optical character recognition. Such applications often have strict time and memory requirements, so efficient representations and algorithms are paramount. We examine one useful technique, the use of failure transitions, to represent automata compactly. A failure transition is taken only when no immediate match to the input is possible at a given state. Automata with failure transitions, initially introduced for string matching problems, have found wider use including compactly representing language, pronunciation, transliteration and semantic models. In this talk, we will address the extension of several weighted finite automata algorithms to automata with failure transitions ($\Phi$-WFAs). Efficient algorithms to intersect two $\Phi$-WFAs, to remove failure transitions, to trim, and to compute the shortest distance in a $\Phi$-WFA will be presented. We will demonstrate the application of some of these algorithms on two language modeling tasks: the distillation of arbitrary probabilistic models as weighted finite automata with failure transitions and the federated learning of n-gram language models. We will show the relevance of these methods to the privacy-preserving training of language models for virtual keyboard applications for mobile devices. This talk covers work in collaboration with Michael Riley, Ananda Theertha Suresh, Brian Roark, Vlad Schogol, Mingqing Chen, Rajiv Mathews, Adeline Wong, and Françoise Beaufays.Wed, 05 Jul 2023 00:00:00 +0000
https://proceedings.mlr.press/v217/allauzen23a.html
https://proceedings.mlr.press/v217/allauzen23a.html