Learning DFAs by Evolving Short Sequences of Merges

Kristian Guillaumier, John Abela
Proceedings of the Fifteenth International Conference on Grammatical Inference, PMLR 153:217-236, 2021.

Abstract

The grammatical inference community has been studying evolutionary methods for DFA learning for almost three decades. These methods typically operate by learning a representation of the target DFA either as a partitioning the states of a prefix tree acceptor or as an encoding of its transition matrix. In this paper, we present an alternative approach for learning random DFAs over binary alphabets from sparse training data. We first conducted several experiments on thousands of problem instances to study their behaviour and to better understand the conditions under which state merging algorithms succeed or fail. Motivated by these observations, we implemented an evolutionary algorithm in which the chromosomes encode short sequences of merges selected from a subset of high state-reduction merges. The fitness of a chromosome is then measured by extending it using the EDSM heuristic and the size of the final hypothesis is used to score the entire sequence. To improve runtime performance, we use a method that can reliably estimate the fitness of a sequence of merges without extending it completely. We use the state-of-the-art EDSM algorithm as a baseline to compare our results to and observe that we can find low-error hypotheses or the exact target DFAs with a considerably higher likelihood.

Cite this Paper


BibTeX
@InProceedings{pmlr-v153-guillaumier21a, title = {Learning DFAs by Evolving Short Sequences of Merges}, author = {Guillaumier, Kristian and Abela, John}, booktitle = {Proceedings of the Fifteenth International Conference on Grammatical Inference}, pages = {217--236}, year = {2021}, editor = {Chandlee, Jane and Eyraud, Rémi and Heinz, Jeff and Jardine, Adam and van Zaanen, Menno}, volume = {153}, series = {Proceedings of Machine Learning Research}, month = {23--27 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v153/guillaumier21a/guillaumier21a.pdf}, url = {https://proceedings.mlr.press/v153/guillaumier21a.html}, abstract = {The grammatical inference community has been studying evolutionary methods for DFA learning for almost three decades. These methods typically operate by learning a representation of the target DFA either as a partitioning the states of a prefix tree acceptor or as an encoding of its transition matrix. In this paper, we present an alternative approach for learning random DFAs over binary alphabets from sparse training data. We first conducted several experiments on thousands of problem instances to study their behaviour and to better understand the conditions under which state merging algorithms succeed or fail. Motivated by these observations, we implemented an evolutionary algorithm in which the chromosomes encode short sequences of merges selected from a subset of high state-reduction merges. The fitness of a chromosome is then measured by extending it using the EDSM heuristic and the size of the final hypothesis is used to score the entire sequence. To improve runtime performance, we use a method that can reliably estimate the fitness of a sequence of merges without extending it completely. We use the state-of-the-art EDSM algorithm as a baseline to compare our results to and observe that we can find low-error hypotheses or the exact target DFAs with a considerably higher likelihood.} }
Endnote
%0 Conference Paper %T Learning DFAs by Evolving Short Sequences of Merges %A Kristian Guillaumier %A John Abela %B Proceedings of the Fifteenth International Conference on Grammatical Inference %C Proceedings of Machine Learning Research %D 2021 %E Jane Chandlee %E Rémi Eyraud %E Jeff Heinz %E Adam Jardine %E Menno van Zaanen %F pmlr-v153-guillaumier21a %I PMLR %P 217--236 %U https://proceedings.mlr.press/v153/guillaumier21a.html %V 153 %X The grammatical inference community has been studying evolutionary methods for DFA learning for almost three decades. These methods typically operate by learning a representation of the target DFA either as a partitioning the states of a prefix tree acceptor or as an encoding of its transition matrix. In this paper, we present an alternative approach for learning random DFAs over binary alphabets from sparse training data. We first conducted several experiments on thousands of problem instances to study their behaviour and to better understand the conditions under which state merging algorithms succeed or fail. Motivated by these observations, we implemented an evolutionary algorithm in which the chromosomes encode short sequences of merges selected from a subset of high state-reduction merges. The fitness of a chromosome is then measured by extending it using the EDSM heuristic and the size of the final hypothesis is used to score the entire sequence. To improve runtime performance, we use a method that can reliably estimate the fitness of a sequence of merges without extending it completely. We use the state-of-the-art EDSM algorithm as a baseline to compare our results to and observe that we can find low-error hypotheses or the exact target DFAs with a considerably higher likelihood.
APA
Guillaumier, K. & Abela, J.. (2021). Learning DFAs by Evolving Short Sequences of Merges. Proceedings of the Fifteenth International Conference on Grammatical Inference, in Proceedings of Machine Learning Research 153:217-236 Available from https://proceedings.mlr.press/v153/guillaumier21a.html.

Related Material