Induction of Directed Acyclic Word Graph in a Bioinformatics Task
The 12th International Conference on Grammatical Inference, PMLR 34:207-217, 2014.
In this paper a new algorithm for the induction of a Directed Acyclic Word Graph (DAWG) is proposed. A DAWG can serve as a very efficient data structure for lexicon representation and fast string matching, and have a variety of applications. Similar structures are being investigated in the theory of formal languages and grammatical inference, namely deterministic and nondeterministic finite automata (DFA and NFA, respectively). Since a DAWG is acyclic the proposed method is suited for problems where the target language does not necessarily have to be infinite. The experiments have been performed for a dataset from the domain of bioinformatics, and our results are compared with those obtained using the current state-of-the-art methods in heuristic DFA induction.