
- title: 'Preface'
  abstract: 'Preface to the proceedings.'
  volume: 153
  URL: https://proceedings.mlr.press/v153/chandlee21a.html
  PDF: https://proceedings.mlr.press/v153/chandlee21a/chandlee21a.pdf
  edit: https://github.com/mlresearch//v153/edit/gh-pages/_posts/2021-08-25-chandlee21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Fifteenth International Conference on Grammatical Inference'
  publisher: 'PMLR'
  author: 
  - given: Jane
    family: Chandlee
  - given: Rémi
    family: Eyraud
  - given: Jeff
    family: Heinz
  - given: Adam
    family: Jardine
  - given: Menno
    prefix: van
    family: Zaanen
  editor: 
  - given: Jane
    family: Chandlee
  - given: Rémi
    family: Eyraud
  - given: Jeff
    family: Heinz
  - given: Adam
    family: Jardine
  - given: Menno
    prefix: van
    family: Zaanen
  page: 1-3
  id: chandlee21a
  issued:
    date-parts: 
      - 2021
      - 8
      - 25
  firstpage: 1
  lastpage: 3
  published: 2021-08-25 00:00:00 +0000
- title: 'Beyond Chomsky Normal Form: Extending Strong Learning Algorithms for PCFGs'
  abstract: 'We extend a recent consistent strong learning algorithm for a subclass of probabilistic context-free grammars in Chomsky normal form, (Clark and Fijalkow, 2020) to a much larger class of grammars by weakening the normal form constraints to allow for a richer class of derivation structures that are not necessarily binary branching,  including among other possibilities unary branching trees which introduce some technical difficulties. By modifying the structural conditions appropriately we obtain an algorithm which is computationally efficient, and a consistent estimator for the class of grammars defined.'
  volume: 153
  URL: https://proceedings.mlr.press/v153/clark21a.html
  PDF: https://proceedings.mlr.press/v153/clark21a/clark21a.pdf
  edit: https://github.com/mlresearch//v153/edit/gh-pages/_posts/2021-08-25-clark21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Fifteenth International Conference on Grammatical Inference'
  publisher: 'PMLR'
  author: 
  - given: Alexander
    family: Clark
  editor: 
  - given: Jane
    family: Chandlee
  - given: Rémi
    family: Eyraud
  - given: Jeff
    family: Heinz
  - given: Adam
    family: Jardine
  - given: Menno
    prefix: van
    family: Zaanen
  page: 4-17
  id: clark21a
  issued:
    date-parts: 
      - 2021
      - 8
      - 25
  firstpage: 4
  lastpage: 17
  published: 2021-08-25 00:00:00 +0000
- title: 'A Hierarchy of Context-Free Languages Learnable from Positive Data and Membership Queries'
  abstract: 'We consider a generalization of the “dual” approach to distributional learning of context-free grammars, where each nonterminal $A$ is associated with a string set $X_A$ characterized by a finite set $C$ of contexts. Rather than letting $X_A$ be the set of all strings accepted by all contexts in $C$ as in previous works, we allow more flexible uses of the contexts in $C$, using some of them positively (contexts that accept the strings in $X_A$) and others negatively (contexts that do not accept any strings in $X_A$). The resulting more general algorithm works in essentially the same way as before, but on a larger class of context-free languages. '
  volume: 153
  URL: https://proceedings.mlr.press/v153/kanazawa21a.html
  PDF: https://proceedings.mlr.press/v153/kanazawa21a/kanazawa21a.pdf
  edit: https://github.com/mlresearch//v153/edit/gh-pages/_posts/2021-08-25-kanazawa21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Fifteenth International Conference on Grammatical Inference'
  publisher: 'PMLR'
  author: 
  - given: Makoto
    family: Kanazawa
  - given: Ryo
    family: Yoshinaka
  editor: 
  - given: Jane
    family: Chandlee
  - given: Rémi
    family: Eyraud
  - given: Jeff
    family: Heinz
  - given: Adam
    family: Jardine
  - given: Menno
    prefix: van
    family: Zaanen
  page: 18-31
  id: kanazawa21a
  issued:
    date-parts: 
      - 2021
      - 8
      - 25
  firstpage: 18
  lastpage: 31
  published: 2021-08-25 00:00:00 +0000
- title: 'Inside-Outside Algorithm for Macro Grammars'
  abstract: 'We propose an inside-outside algorithm for stochastic macro grammars. Our approach is based on types, which has been inspired by type-based approaches to reasoning about functional programs and higher-order grammars. By considering type derivations instead of ordinary word derivation sequences, we can naturally extend the standard inside-outside algorithm for stochastic context-free grammars to obtain the algorithm for stochastic macro grammars. We have implemented the algorithm and confirmed its effectiveness through an application to the learning of macro grammars.'
  volume: 153
  URL: https://proceedings.mlr.press/v153/kambe21a.html
  PDF: https://proceedings.mlr.press/v153/kambe21a/kambe21a.pdf
  edit: https://github.com/mlresearch//v153/edit/gh-pages/_posts/2021-08-25-kambe21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Fifteenth International Conference on Grammatical Inference'
  publisher: 'PMLR'
  author: 
  - given: Ryuta
    family: Kambe
  - given: Naoki
    family: Kobayashi
  - given: Ryosuke
    family: Sato
  - given: Ayumi
    family: Shinohara
  - given: Ryo
    family: Yoshinaka
  editor: 
  - given: Jane
    family: Chandlee
  - given: Rémi
    family: Eyraud
  - given: Jeff
    family: Heinz
  - given: Adam
    family: Jardine
  - given: Menno
    prefix: van
    family: Zaanen
  page: 32-46
  id: kambe21a
  issued:
    date-parts: 
      - 2021
      - 8
      - 25
  firstpage: 32
  lastpage: 46
  published: 2021-08-25 00:00:00 +0000
- title: 'Learning Input Strictly Local Functions From Their Composition'
  abstract: 'This paper studies the learning of two functions given positive samples of their composition, motivated by an empirical problem in natural language phonology.  Empirically relevant conditions under which this is possible are identified and a provably correct algorithm is given that can <em>semi-strongly</em> identify the two functions in polynomial time and data. In order to clearly illustrate the learning problem and related concepts, we focus on a simple subset of <em>input strictly 2-local functions</em>.  But we further argue that the general learning procedure we propose can be extended to more general classes of functions.'
  volume: 153
  URL: https://proceedings.mlr.press/v153/hua21a.html
  PDF: https://proceedings.mlr.press/v153/hua21a/hua21a.pdf
  edit: https://github.com/mlresearch//v153/edit/gh-pages/_posts/2021-08-25-hua21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Fifteenth International Conference on Grammatical Inference'
  publisher: 'PMLR'
  author: 
  - given: Wenyue
    family: Hua
  - given: Adam
    family: Jardine
  editor: 
  - given: Jane
    family: Chandlee
  - given: Rémi
    family: Eyraud
  - given: Jeff
    family: Heinz
  - given: Adam
    family: Jardine
  - given: Menno
    prefix: van
    family: Zaanen
  page: 47-65
  id: hua21a
  issued:
    date-parts: 
      - 2021
      - 8
      - 25
  firstpage: 47
  lastpage: 65
  published: 2021-08-25 00:00:00 +0000
- title: 'Learning Multiple Independent Tier-based Processes'
  abstract: 'Work on the learnability of tier-based string-to-string functions has so far been limited to those that operate over a single tier. Such functions are, however, generally incapable of modelling multiple simultaneous processes. It is relatively easy to define a class of multi-tiered functions that can handle more than one process at a time, but doing so has negative consequences for learnability. Namely, we conjecture that it is difficult if not impossible to efficiently learn any arbitrary set of tiers from positive data alone. We thus describe the <em>strongly target-specified</em> subclass of multi-tiered functions, whose tiers can be efficiently identified from positive data. In these functions, each input element is associated with a single tier that on its own can fully determine what the element is mapped to. When the tiers act independently in this way, we can learn them in isolation from each other. A transducer representation of the target function can then be constructed using the discovered tiers.'
  volume: 153
  URL: https://proceedings.mlr.press/v153/burness21a.html
  PDF: https://proceedings.mlr.press/v153/burness21a/burness21a.pdf
  edit: https://github.com/mlresearch//v153/edit/gh-pages/_posts/2021-08-25-burness21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Fifteenth International Conference on Grammatical Inference'
  publisher: 'PMLR'
  author: 
  - given: Phillip
    family: Burness
  - given: Kevin
    family: McMullin
  editor: 
  - given: Jane
    family: Chandlee
  - given: Rémi
    family: Eyraud
  - given: Jeff
    family: Heinz
  - given: Adam
    family: Jardine
  - given: Menno
    prefix: van
    family: Zaanen
  page: 66-80
  id: burness21a
  issued:
    date-parts: 
      - 2021
      - 8
      - 25
  firstpage: 66
  lastpage: 80
  published: 2021-08-25 00:00:00 +0000
- title: 'Grammar Interpretations and Learning TSL Online'
  abstract: 'The tier-based strictly local ($\TSL{}$) languages are a class of formal languages that, alongside the strictly piecewise class, effectively model some long-distance generalizations in natural language (Heinz et al., 2011). Two learning algorithms for $\TSL{}$ already exist: one by Jardine and Heinz (2016) and one by Jardine and McMullin (2017). The former is limited in that it cannot learn all $\TSL{}$ patterns. The latter is restricted to a batch-learning environment. We present a general algorithm without these limitations. In particular we show that $\TSL{}$ is efficiently learnable online via reinterpretation of a strictly local grammar, and this mechanism generalizes to the strictly piecewise class as well. However we note that the known $\TSL{}$ learning algorithms are not robust in the face of interaction with other constraints, posing a challenge for the utility of this class for phonotactics.'
  volume: 153
  URL: https://proceedings.mlr.press/v153/lambert21a.html
  PDF: https://proceedings.mlr.press/v153/lambert21a/lambert21a.pdf
  edit: https://github.com/mlresearch//v153/edit/gh-pages/_posts/2021-08-25-lambert21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Fifteenth International Conference on Grammatical Inference'
  publisher: 'PMLR'
  author: 
  - given: Dakotah
    family: Lambert
  editor: 
  - given: Jane
    family: Chandlee
  - given: Rémi
    family: Eyraud
  - given: Jeff
    family: Heinz
  - given: Adam
    family: Jardine
  - given: Menno
    prefix: van
    family: Zaanen
  page: 81-91
  id: lambert21a
  issued:
    date-parts: 
      - 2021
      - 8
      - 25
  firstpage: 81
  lastpage: 91
  published: 2021-08-25 00:00:00 +0000
- title: 'Extracting Weighted Automata for Approximate Minimization in Language Modelling'
  abstract: 'In this paper we study the approximate minimization problem for language modelling.  We assume we are given some language model as a black box. The objective is to obtain a weighted finite automaton (WFA) that fits within a given size constraint and which mimics the behaviour of the original model while minimizing some notion of distance between the black box and the extracted WFA. We provide an algorithm for the approximate minimization of black boxes trained for language modelling of sequential data over a one-letter alphabet. By reformulating the problem in terms of Hankel matrices, we leverage classical results on the approximation of Hankel operators, namely the celebrated Adamyan-Arov-Krein (AAK) theory. This allows us to use the spectral norm to measure the distance between the black box and the WFA. We provide theoretical guarantees to study the potentially infinite-rank Hankel matrix of the black box, without accessing the training data, and we prove that our method returns an asymptotically-optimal approximation.'
  volume: 153
  URL: https://proceedings.mlr.press/v153/lacroce21a.html
  PDF: https://proceedings.mlr.press/v153/lacroce21a/lacroce21a.pdf
  edit: https://github.com/mlresearch//v153/edit/gh-pages/_posts/2021-08-25-lacroce21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Fifteenth International Conference on Grammatical Inference'
  publisher: 'PMLR'
  author: 
  - given: Clara
    family: Lacroce
  - given: Prakash
    family: Panangaden
  - given: Guillaume
    family: Rabusseau
  editor: 
  - given: Jane
    family: Chandlee
  - given: Rémi
    family: Eyraud
  - given: Jeff
    family: Heinz
  - given: Adam
    family: Jardine
  - given: Menno
    prefix: van
    family: Zaanen
  page: 92-112
  id: lacroce21a
  issued:
    date-parts: 
      - 2021
      - 8
      - 25
  firstpage: 92
  lastpage: 112
  published: 2021-08-25 00:00:00 +0000
- title: 'Extracting Context-Free Grammars from Recurrent Neural Networks using Tree-Automata Learning and A* Search'
  abstract: 'This paper presents (i) an active learning algorithm for visibly pushdown grammars and (ii) shows its applicability for learning surrogate models of recurrent neural networks (RNNs) trained on context-free languages.       Such surrogate models may be used for verification or explainability. Our learning algorithm makes use of the proximity of visibly pushdown languages and regular tree languages and builds on an existing learning algorithm for regular tree languages. Equivalence tests between a given RNN and a hypothesis grammar rely on a mixture of A* search and random sampling. An evaluation of our approach on a set of RNNs from the literature shows good preliminary results.'
  volume: 153
  URL: https://proceedings.mlr.press/v153/barbot21a.html
  PDF: https://proceedings.mlr.press/v153/barbot21a/barbot21a.pdf
  edit: https://github.com/mlresearch//v153/edit/gh-pages/_posts/2021-08-25-barbot21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Fifteenth International Conference on Grammatical Inference'
  publisher: 'PMLR'
  author: 
  - given: Benoît
    family: Barbot
  - given: Benedikt
    family: Bollig
  - given: Alain
    family: Finkel
  - given: Serge
    family: Haddad
  - given: Igor
    family: Khmelnitsky
  - given: Martin
    family: Leucker
  - given: Daniel
    family: Neider
  - given: Rajarshi
    family: Roy
  - given: Lina
    family: Ye
  editor: 
  - given: Jane
    family: Chandlee
  - given: Rémi
    family: Eyraud
  - given: Jeff
    family: Heinz
  - given: Adam
    family: Jardine
  - given: Menno
    prefix: van
    family: Zaanen
  page: 113-129
  id: barbot21a
  issued:
    date-parts: 
      - 2021
      - 8
      - 25
  firstpage: 113
  lastpage: 129
  published: 2021-08-25 00:00:00 +0000
- title: 'Recognizing Long Grammatical Sequences using Recurrent Networks Augmented with an External Differentiable Stack'
  abstract: 'Recurrent neural networks (RNNs) are widely used for sequence modeling, generation, and prediction. Despite success in applications such as machine translation and voice recognition, these stateful models have several critical shortcomings. Specifically, RNNs struggle to recognize very long sequences, which limits their applicability in many important temporal processing and time series forecasting problems. For example, RNNs struggle in recognizing complex context free languages (CFLs), unable to reach 100% accuracy on the training set. One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack. However, differentiable memories in prior work have neither been extensively studied on complex CFLs nor tested on very long sequences that have been seen on training. In fact, earlier work has shown that continuous differentiable memory structures struggle in recognizing complex CFLs over very long sequences. In this paper, we improve the memory-augmented RNN with new architectural and state updating mechanisms that learn to properly balance the use of latent states with external memory. Our improved RNN models exhibit improved generalization and are able to classify long strings generated by complex hierarchical context free grammars (CFGs). We evaluate our models on CFGs, including the Dyck languages, as well as on the Penn Treebank language modelling dataset,  achieving stable, robust performance on these benchmarks. Furthermore, we show that our proposed memory-augmented networks are able retain information over long sequences leading to improved generalization for strings up to length $160$.'
  volume: 153
  URL: https://proceedings.mlr.press/v153/mali21a.html
  PDF: https://proceedings.mlr.press/v153/mali21a/mali21a.pdf
  edit: https://github.com/mlresearch//v153/edit/gh-pages/_posts/2021-08-25-mali21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Fifteenth International Conference on Grammatical Inference'
  publisher: 'PMLR'
  author: 
  - given: Ankur
    family: Mali
  - given: Alexander
    family: Ororbia
  - given: Daniel
    family: Kifer
  - given: Lee
    family: Giles
  editor: 
  - given: Jane
    family: Chandlee
  - given: Rémi
    family: Eyraud
  - given: Jeff
    family: Heinz
  - given: Adam
    family: Jardine
  - given: Menno
    prefix: van
    family: Zaanen
  page: 130-153
  id: mali21a
  issued:
    date-parts: 
      - 2021
      - 8
      - 25
  firstpage: 130
  lastpage: 153
  published: 2021-08-25 00:00:00 +0000
- title: 'Investigating Backpropagation Alternatives when Learning to Dynamically Count with Recurrent Neural Networks'
  abstract: 'Artificial neural networks, such as recurrent neural networks (RNNs) and transformers, have empirically demonstrated impressive performance across many natural language processing tasks. However, automated text processing at a deeper and more interpretable level arguably requires extracting intricate patterns such as underlying grammatical structures. As a result, correctly interpreting a neural language model would require an understanding of linguistic structure through formal language theory. Nonetheless, there is often a discrepancy between theoretical and practical findings that restrict models informed by formal language theory in real-life scenarios. For instance, while learning context-free grammars (CFGs), existing neural models fall short because they lack appropriate memory structures. In this work, we investigate how learning algorithms affect the generalization ability of RNNs that are designed to learn context-free languages (CFGs) as well as their ability to encode hierarchical representations. To do so, we investigate a range of learning algorithms on complex, context-free languages such as the Dyck languages, with a focus on the RNN’s ability to generalize to longer sequences when processing a CFG. Our results demonstrate that a Long Short-term memory (LSTM) RNN equipped with second-order connections, trained with the sparse attentive backtracking (SAB) algorithm, performs stably across various Dyck languages and successfully emulates real-time-counter machines. We empirically show that RNNs without external memory are incapable of recognizing Dyck-2 languages, which require a stack-like structure. We finally investigate each learning algorithm’s performance on real-world language modeling tasks using the Penn Tree Bank and text8 benchmarks. We further investigate how an increase in model parameters affects each RNN’s stability and grammar recognition performance when trained using different learning algorithms.'
  volume: 153
  URL: https://proceedings.mlr.press/v153/mali21b.html
  PDF: https://proceedings.mlr.press/v153/mali21b/mali21b.pdf
  edit: https://github.com/mlresearch//v153/edit/gh-pages/_posts/2021-08-25-mali21b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Fifteenth International Conference on Grammatical Inference'
  publisher: 'PMLR'
  author: 
  - given: Ankur
    family: Mali
  - given: Alexander
    family: Ororbia
  - given: Daniel
    family: Kifer
  - given: Lee
    family: Giles
  editor: 
  - given: Jane
    family: Chandlee
  - given: Rémi
    family: Eyraud
  - given: Jeff
    family: Heinz
  - given: Adam
    family: Jardine
  - given: Menno
    prefix: van
    family: Zaanen
  page: 154-175
  id: mali21b
  issued:
    date-parts: 
      - 2021
      - 8
      - 25
  firstpage: 154
  lastpage: 175
  published: 2021-08-25 00:00:00 +0000
- title: 'Using Grammatical Inference to Build Privacy Preserving Data-sets of User Logs'
  abstract: 'In many web applications, user logs are extracted to build a user model which can be part of further development, recommendation systems or personalization. This is the case for education platforms like X5GON. In order to obtain community collaboration, these logs should be shared, but logical privacy issues arise. In this work, we propose to build a user model from a data-set of logs: this will be a timed and probabilistic $k$-testable automaton, which can then be used to generate a new data-set having statistically close characteristics, yet have in which the original sequences have been sufficiently chunked the original data to not be able to identify the original logs. Following ideas from Differencial Privacy, we provide a second algorithm allowing to eliminate any strings whose influence would be too great. Experiments validate the approach.'
  volume: 153
  URL: https://proceedings.mlr.press/v153/connes21a.html
  PDF: https://proceedings.mlr.press/v153/connes21a/connes21a.pdf
  edit: https://github.com/mlresearch//v153/edit/gh-pages/_posts/2021-08-25-connes21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Fifteenth International Conference on Grammatical Inference'
  publisher: 'PMLR'
  author: 
  - given: Victor
    family: Connes
  - given: Colin
    family: De La Higuera
  - given: Hoel
    family: Le Capitaine
  editor: 
  - given: Jane
    family: Chandlee
  - given: Rémi
    family: Eyraud
  - given: Jeff
    family: Heinz
  - given: Adam
    family: Jardine
  - given: Menno
    prefix: van
    family: Zaanen
  page: 176-190
  id: connes21a
  issued:
    date-parts: 
      - 2021
      - 8
      - 25
  firstpage: 176
  lastpage: 190
  published: 2021-08-25 00:00:00 +0000
- title: 'A Toolbox for Context-Sensitive Grammar Induction by Genetic Search'
  abstract: 'This paper introduces a new tool for context-sensitive grammar inference. The source code and library are publicly available via GitHub and NuGet repositories. The article describes the implemented algorithm, input parameters, and the produced output grammar. In addition, the paper contains several use-cases. The described library is written in F#, hence it can be used in any .NET Framework language (F#, C#, C++/CLI, Visual Basic, and J#) and run under the control of varied operating systems.'
  volume: 153
  URL: https://proceedings.mlr.press/v153/wieczorek21a.html
  PDF: https://proceedings.mlr.press/v153/wieczorek21a/wieczorek21a.pdf
  edit: https://github.com/mlresearch//v153/edit/gh-pages/_posts/2021-08-25-wieczorek21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Fifteenth International Conference on Grammatical Inference'
  publisher: 'PMLR'
  author: 
  - given: Wojciech
    family: Wieczorek
  - given: Łukasz
    family: Strąk
  - given: Arkadiusz
    family: Nowakowski
  - given: Olgierd
    family: Unold
  editor: 
  - given: Jane
    family: Chandlee
  - given: Rémi
    family: Eyraud
  - given: Jeff
    family: Heinz
  - given: Adam
    family: Jardine
  - given: Menno
    prefix: van
    family: Zaanen
  page: 191-201
  id: wieczorek21a
  issued:
    date-parts: 
      - 2021
      - 8
      - 25
  firstpage: 191
  lastpage: 201
  published: 2021-08-25 00:00:00 +0000
- title: 'Query Learning Algorithm for Symbolic Weighted Finite Automata'
  abstract: 'We propose a query learning algorithm for an extension of weighted finite automata (WFAs), named symbolic weighted finite automata (SWFAs), which can handle strings over infinite alphabets more efficiently. Based on the idea of symbolic finite automata, SWFAs generalize WFAs by allowing transitions to be functions from a possibly infinite alphabet to weights. Our algorithm can learn SWFAs if functions in transitions are also learnable by queries. We also investigate minimization and equivalence checking for SWFAs.'
  volume: 153
  URL: https://proceedings.mlr.press/v153/suzuki21a.html
  PDF: https://proceedings.mlr.press/v153/suzuki21a/suzuki21a.pdf
  edit: https://github.com/mlresearch//v153/edit/gh-pages/_posts/2021-08-25-suzuki21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Fifteenth International Conference on Grammatical Inference'
  publisher: 'PMLR'
  author: 
  - given: Kaito
    family: Suzuki
  - given: Diptarama
    family: Hendrian
  - given: Ryo
    family: Yoshinaka
  - given: Ayumi
    family: Shinohara
  editor: 
  - given: Jane
    family: Chandlee
  - given: Rémi
    family: Eyraud
  - given: Jeff
    family: Heinz
  - given: Adam
    family: Jardine
  - given: Menno
    prefix: van
    family: Zaanen
  page: 202-216
  id: suzuki21a
  issued:
    date-parts: 
      - 2021
      - 8
      - 25
  firstpage: 202
  lastpage: 216
  published: 2021-08-25 00:00:00 +0000
- title: 'Learning DFAs by Evolving Short Sequences of Merges'
  abstract: 'The grammatical inference community has been studying evolutionary methods for DFA learning for almost three decades. These methods typically operate by learning a representation of the target DFA either as a partitioning the states of a prefix tree acceptor or as an encoding of its transition matrix. In this paper, we present an alternative approach for learning random DFAs over binary alphabets from sparse training data. We first conducted several experiments on thousands of problem instances to study their behaviour and to better understand the conditions under which state merging algorithms succeed or fail. Motivated by these observations, we implemented an evolutionary algorithm in which the chromosomes encode short sequences of merges selected from a subset of high state-reduction merges. The fitness of a chromosome is then measured by extending it using the EDSM heuristic and the size of the final hypothesis is used to score the entire sequence. To improve runtime performance, we use a method that can reliably estimate the fitness of a sequence of merges without extending it completely. We use the state-of-the-art EDSM algorithm as a baseline to compare our results to and observe that we can find low-error hypotheses or the exact target DFAs with a considerably higher likelihood.'
  volume: 153
  URL: https://proceedings.mlr.press/v153/guillaumier21a.html
  PDF: https://proceedings.mlr.press/v153/guillaumier21a/guillaumier21a.pdf
  edit: https://github.com/mlresearch//v153/edit/gh-pages/_posts/2021-08-25-guillaumier21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Fifteenth International Conference on Grammatical Inference'
  publisher: 'PMLR'
  author: 
  - given: Kristian
    family: Guillaumier
  - given: John
    family: Abela
  editor: 
  - given: Jane
    family: Chandlee
  - given: Rémi
    family: Eyraud
  - given: Jeff
    family: Heinz
  - given: Adam
    family: Jardine
  - given: Menno
    prefix: van
    family: Zaanen
  page: 217-236
  id: guillaumier21a
  issued:
    date-parts: 
      - 2021
      - 8
      - 25
  firstpage: 217
  lastpage: 236
  published: 2021-08-25 00:00:00 +0000
- title: 'The complexity of learning linear temporal formulas from examples'
  abstract: 'In this paper we initiate the study of the computational complexity of learning linear temporal logic (LTL) formulas from examples. We construct approximation algorithms for fragments of LTL and prove hardness results;  in particular we obtain tight bounds for the fragment containing only the next operator and conjunctions, and prove NP-hardness results for many fragments. '
  volume: 153
  URL: https://proceedings.mlr.press/v153/fijalkow21a.html
  PDF: https://proceedings.mlr.press/v153/fijalkow21a/fijalkow21a.pdf
  edit: https://github.com/mlresearch//v153/edit/gh-pages/_posts/2021-08-25-fijalkow21a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the Fifteenth International Conference on Grammatical Inference'
  publisher: 'PMLR'
  author: 
  - given: Nathanaël
    family: Fijalkow
  - given: Guillaume
    family: Lagarde
  editor: 
  - given: Jane
    family: Chandlee
  - given: Rémi
    family: Eyraud
  - given: Jeff
    family: Heinz
  - given: Adam
    family: Jardine
  - given: Menno
    prefix: van
    family: Zaanen
  page: 237-250
  id: fijalkow21a
  issued:
    date-parts: 
      - 2021
      - 8
      - 25
  firstpage: 237
  lastpage: 250
  published: 2021-08-25 00:00:00 +0000
