- title: 'Preface'
abstract: 'Preface to the Proceedings of the Eleventh International Conference on Grammatical Inference September 5-8, 2012, University of Maryland, College Park, United States.'
volume: 21
URL: http://proceedings.mlr.press/v21/heinz12a.html
PDF: http://proceedings.mlr.press/v21/heinz12a/heinz12a.pdf
edit: https://github.com/mlresearch/v21/edit/gh-pages/_posts/2012-08-16-heinz12a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eleventh International Conference on Grammatical Inference'
publisher: 'PMLR'
author:
- family: Heinz
given: Jeffrey
- family: de la Higuera
given: Colin
- family: Oates
given: Tim
editor:
- family: Heinz
given: Jeffrey
- family: Higuera
given: Colin
- family: Oates
given: Tim
address: University of Maryland, College Park, MD, USA
page: 1-3
id: heinz12a
issued:
date-parts:
- 2012
- 8
- 16
firstpage: 1
lastpage: 3
published: 2012-08-16 00:00:00 +0000
- title: 'Learning and Testing the Bounded Retransmission Protocol'
abstract: 'Using a well-known industrial case study from the verification literature, the bounded retransmission protocol, we show how active learning can be used to establish the correctness of protocol implementation I relative to a given reference implementation R. Using active learning, we learn a model M_R of reference implementation R, which serves as input for a model based testing tool that checks conformance of implementation I to M_R. In addition, we also explore an alternative approach in which we learn a model M_I of implementation I, which is compared to model M_R using an equivalence checker. Our work uses a unique combination of software tools for model construction (Uppaal), active learning (LearnLib, Tomte), model-based testing (JTorX, TorXakis) and verification (CADP, MRMC). We show how these tools can be used for learning these models, analyzing the obtained results, and improving the learning performance.'
volume: 21
URL: http://proceedings.mlr.press/v21/aarts12a.html
PDF: http://proceedings.mlr.press/v21/aarts12a/aarts12a.pdf
edit: https://github.com/mlresearch/v21/edit/gh-pages/_posts/2012-08-16-aarts12a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eleventh International Conference on Grammatical Inference'
publisher: 'PMLR'
author:
- family: Aarts
given: Fides
- family: Kuppens
given: Harco
- family: Tretmans
given: Jan
- family: Vaandrager
given: Frits
- family: Verwer
given: Sicco
editor:
- family: Heinz
given: Jeffrey
- family: Higuera
given: Colin
- family: Oates
given: Tim
address: University of Maryland, College Park, MD, USA
page: 4-18
id: aarts12a
issued:
date-parts:
- 2012
- 8
- 16
firstpage: 4
lastpage: 18
published: 2012-08-16 00:00:00 +0000
- title: 'Actively Learning Probabilistic Subsequential Transducers'
abstract: 'In this paper we investigate learning of probabilistic subsequential transducers in an active learning environment. In our learning algorithm the learner interacts with an oracle by asking probabilistic queries on the observed data. We prove our algorithm in an identification in the limit model. We also provide experimental evidence to show the correctness and to analyze the learnability of the proposed algorithm.'
volume: 21
URL: http://proceedings.mlr.press/v21/akram12a.html
PDF: http://proceedings.mlr.press/v21/akram12a/akram12a.pdf
edit: https://github.com/mlresearch/v21/edit/gh-pages/_posts/2012-08-16-akram12a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eleventh International Conference on Grammatical Inference'
publisher: 'PMLR'
author:
- family: Akram
given: Hasan Ibne
- family: La Higuera
given: Colin
- family: Eckert
given: Claudia
editor:
- family: Heinz
given: Jeffrey
- family: Higuera
given: Colin
- family: Oates
given: Tim
address: University of Maryland, College Park, MD, USA
page: 19-33
id: akram12a
issued:
date-parts:
- 2012
- 8
- 16
firstpage: 19
lastpage: 33
published: 2012-08-16 00:00:00 +0000
- title: 'Bootstrapping and Learning PDFA in Data Streams'
abstract: 'Markovian models with hidden state are widely-used formalisms for modeling sequential phenomena. Learnability of these models has been well studied when the sample is given in batch mode, and algorithms with PAC-like learning guarantees exist for specific classes of models such as Probabilistic Deterministic Finite Automata (PDFA). Here we focus on PDFA and give an algorithm for infering models in this class under the stringent \emphdata stream scenario: unlike existing methods, our algorithm works incrementally and in one pass, uses memory sublinear in the stream length, and processes input items in amortized constant time. We provide rigorous PAC-like bounds for all of the above, as well as an evaluation on synthetic data showing that the algorithm performs well in practice. Our algorithm makes a key usage of several old and new sketching techniques. In particular, we develop a new sketch for implementing bootstrapping in a streaming setting which may be of independent interest. In experiments we have observed that this sketch yields important reductions in the examples required for performing some crucial statistical tests in our algorithm.'
volume: 21
URL: http://proceedings.mlr.press/v21/balle12a.html
PDF: http://proceedings.mlr.press/v21/balle12a/balle12a.pdf
edit: https://github.com/mlresearch/v21/edit/gh-pages/_posts/2012-08-16-balle12a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eleventh International Conference on Grammatical Inference'
publisher: 'PMLR'
author:
- family: Balle
given: Borja
- family: Castro
given: Jorge
- family: Gavaldà
given: Ricard
editor:
- family: Heinz
given: Jeffrey
- family: Higuera
given: Colin
- family: Oates
given: Tim
address: University of Maryland, College Park, MD, USA
page: 34-48
id: balle12a
issued:
date-parts:
- 2012
- 8
- 16
firstpage: 34
lastpage: 48
published: 2012-08-16 00:00:00 +0000
- title: 'Speeding Up Syntactic Learning Using Contextual Information'
abstract: 'It has been shown in (Angluin and Becerra-Bonache, 2010, 2011) that interactions between a learner and a teacher can help language learning. In this paper, we make use of additional contextual information in a pairwise-based generative approach aiming at learning (situation,sentence)-pair-hidden markov models. We show that this allows a significant speed-up of the convergence of the syntactic learning. We apply our model on a toy natural language task in Spanish dealing with geometric objects.'
volume: 21
URL: http://proceedings.mlr.press/v21/becerra12a.html
PDF: http://proceedings.mlr.press/v21/becerra12a/becerra12a.pdf
edit: https://github.com/mlresearch/v21/edit/gh-pages/_posts/2012-08-16-becerra12a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eleventh International Conference on Grammatical Inference'
publisher: 'PMLR'
author:
- family: Bonache
given: Leonor Becerra
- family: Fromont
given: Elisa
- family: Habrard
given: Amaury
- family: Perrot
given: Michaël
- family: Sebban
given: Marc
editor:
- family: Heinz
given: Jeffrey
- family: Higuera
given: Colin
- family: Oates
given: Tim
address: University of Maryland, College Park, MD, USA
page: 49-53
id: becerra12a
issued:
date-parts:
- 2012
- 8
- 16
firstpage: 49
lastpage: 53
published: 2012-08-16 00:00:00 +0000
- title: 'Clearing Restarting Automata and Grammatical Inference'
abstract: 'Clearing and subword-clearing restarting automata are linguistically motivated models of automata. We investigate the problem of grammatical inference for such automata based on the given set of positive and negative samples. We show that it is possible to identify these models in the limit. In this way we can learn a large class of languages. On the other hand, we prove that the task of finding a clearing restarting automaton consistent with a given set of positive and negative samples is NP-hard, provided that we impose an upper bound on the width of its instructions.'
volume: 21
URL: http://proceedings.mlr.press/v21/cerno12a.html
PDF: http://proceedings.mlr.press/v21/cerno12a/cerno12a.pdf
edit: https://github.com/mlresearch/v21/edit/gh-pages/_posts/2012-08-16-cerno12a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eleventh International Conference on Grammatical Inference'
publisher: 'PMLR'
author:
- family: Černo
given: Peter
editor:
- family: Heinz
given: Jeffrey
- family: Higuera
given: Colin
- family: Oates
given: Tim
address: University of Maryland, College Park, MD, USA
page: 54-68
id: cerno12a
issued:
date-parts:
- 2012
- 8
- 16
firstpage: 54
lastpage: 68
published: 2012-08-16 00:00:00 +0000
- title: 'Integrating Grammatical Inference into Robotic Planning'
abstract: 'This paper presents a method for the control synthesis of robotic systems in an unknown, dynamic, and adversarial environments. We (1) incorporate a grammatical inference module that identifies the governing dynamics of the adversarial environment and (2) utilize game theory to compute a motion plan for a system given a task specification. The framework is flexible and modular since different games can be formulated for different system objectives and different grammatical inference algorithms can be utilized depending on the abstract nature of the dynamic environment.'
volume: 21
URL: http://proceedings.mlr.press/v21/chandlee12a.html
PDF: http://proceedings.mlr.press/v21/chandlee12a/chandlee12a.pdf
edit: https://github.com/mlresearch/v21/edit/gh-pages/_posts/2012-08-16-chandlee12a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eleventh International Conference on Grammatical Inference'
publisher: 'PMLR'
author:
- family: Chandlee
given: Jane
- family: Fu
given: Jie
- family: Karydis
given: Konstantinos
- family: Koirala
given: Cesar
- family: Heinz
given: Jeffrey
- family: Tanner
given: Herbert
editor:
- family: Heinz
given: Jeffrey
- family: Higuera
given: Colin
- family: Oates
given: Tim
address: University of Maryland, College Park, MD, USA
page: 69-83
id: chandlee12a
issued:
date-parts:
- 2012
- 8
- 16
firstpage: 69
lastpage: 83
published: 2012-08-16 00:00:00 +0000
- title: 'Beyond Semilinearity: Distributional Learning of Parallel Multiple Context-free Grammars'
abstract: 'Semilinearity is widely held to be a linguistic invariant but, controversially, some linguistic phenomena in languages like Old Georgian and Yoruba seem to violate this constraint. In this paper we extend distributional learning to the class of parallel multiple context-free grammars, a class which as far as is known includes all attested natural languages, even taking an extreme view on these examples. These grammars may have a copying operation that can recursively copy constituents, allowing them to generate non-semilinear languages. We generalise the notion of a context to a class of functions that include copying operations. The congruential approach is ineffective at this level of the hierarchy; accordingly we extend this using dual approaches, defining nonterminals using sets of these generalised contexts. As a corollary we also extend the multiple context free grammars using the lattice based approaches.'
volume: 21
URL: http://proceedings.mlr.press/v21/clark12a.html
PDF: http://proceedings.mlr.press/v21/clark12a/clark12a.pdf
edit: https://github.com/mlresearch/v21/edit/gh-pages/_posts/2012-08-16-clark12a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eleventh International Conference on Grammatical Inference'
publisher: 'PMLR'
author:
- family: Clark
given: Alexander
- family: Yoshinaka
given: Ryo
editor:
- family: Heinz
given: Jeffrey
- family: Higuera
given: Colin
- family: Oates
given: Tim
address: University of Maryland, College Park, MD, USA
page: 84-96
id: clark12a
issued:
date-parts:
- 2012
- 8
- 16
firstpage: 84
lastpage: 96
published: 2012-08-16 00:00:00 +0000
- title: 'Locally Substitutable Languages for Enhanced Inductive Leaps'
abstract: 'Genomic banks are fed continuously by large sets of DNA or RNA sequences coming from high throughput machines. Protein annotation is a task of first importance with respect to these banks. It consists of retrieving the genes that code for proteins within the sequences and then predict the function of these new proteins in the cell by comparison with known families. Many methods have been designed to characterize protein families and discover new members, mainly based on subsets of regular expressions or simple Hidden Markov Models. We are interested in more expressive models that are able to capture the long-range characteristic interactions occurring in the spatial structure of the analyzed protein family. Starting from the work of Clark and Eyraud (2007) and Yoshinaka (2008) on inference of substitutable and \emphk,l-substitutable languages respectively, we introduce new classes of substitutable languages using local rather than global substitutability, a reasonable assumption with respect to protein structures to enhance inductive leaps performed by least generalized generalization approaches. The concepts are illustrated on a first experiment using a real proteic sequence set.'
volume: 21
URL: http://proceedings.mlr.press/v21/coste12a.html
PDF: http://proceedings.mlr.press/v21/coste12a/coste12a.pdf
edit: https://github.com/mlresearch/v21/edit/gh-pages/_posts/2012-08-16-coste12a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eleventh International Conference on Grammatical Inference'
publisher: 'PMLR'
author:
- family: Coste
given: FranÃ§ois
- family: Garet
given: Gaëlle
- family: Nicolas
given: Jacques
editor:
- family: Heinz
given: Jeffrey
- family: Higuera
given: Colin
- family: Oates
given: Tim
address: University of Maryland, College Park, MD, USA
page: 97-111
id: coste12a
issued:
date-parts:
- 2012
- 8
- 16
firstpage: 97
lastpage: 111
published: 2012-08-16 00:00:00 +0000
- title: 'Grammar Induction: Beyond Local Search'
abstract: 'Many approaches to probabilistic grammar induction operate by iteratively improving a single grammar, beginning with an initial guess. These local search paradigms include (variational) EM, MCMC, and greedy model merging or splitting procedures. Unfortunately, local search methods tend to get caught in local optima, even with random restarts. Two approaches are outlined that try to avoid this problem. One uses branch-and-bound methods from mathematical programming to eliminate regions of parameter space that cannot contain the global optimum. The other is inspired by recent work on deep learning, and uses spectral methods to build up featural representations of all substrings, without premature commitment to which substrings are constituents.'
volume: 21
URL: http://proceedings.mlr.press/v21/eisner12a.html
PDF: http://proceedings.mlr.press/v21/eisner12a/eisner12a.pdf
edit: https://github.com/mlresearch/v21/edit/gh-pages/_posts/2012-08-16-eisner12a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eleventh International Conference on Grammatical Inference'
publisher: 'PMLR'
author:
- family: Eisner
given: Jason
editor:
- family: Heinz
given: Jeffrey
- family: Higuera
given: Colin
- family: Oates
given: Tim
address: University of Maryland, College Park, MD, USA
page: 112-113
id: eisner12a
issued:
date-parts:
- 2012
- 8
- 16
firstpage: 112
lastpage: 113
published: 2012-08-16 00:00:00 +0000
- title: 'Learning Substitutable Binary Plane Graph Grammars'
abstract: 'While some heuristics exist for the learning of graph grammars, few has been done on the theoretical side. Due to complexity issues, the class of graphs has to be restricted: this paper deals with the subclass of plane graphs, which correspond to drawings of planar graphs. This allows us to introduce a new kind of graph grammars, using a face-replacement mechanism. To learn them, we extend recent successful techniques developed for string grammars, and based on a property on target languages: the substitutability property. We show how this property can be extended to plane graph languages and finally state the first identification in the limit result for a class of graph grammars, as far as we know.'
volume: 21
URL: http://proceedings.mlr.press/v21/eyraud12a.html
PDF: http://proceedings.mlr.press/v21/eyraud12a/eyraud12a.pdf
edit: https://github.com/mlresearch/v21/edit/gh-pages/_posts/2012-08-16-eyraud12a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eleventh International Conference on Grammatical Inference'
publisher: 'PMLR'
author:
- family: Eyraud
given: RÃ©mi
- family: Janodet
given: Jean-Christophe
- family: Oates
given: Tim
editor:
- family: Heinz
given: Jeffrey
- family: Higuera
given: Colin
- family: Oates
given: Tim
address: University of Maryland, College Park, MD, USA
page: 114-128
id: eyraud12a
issued:
date-parts:
- 2012
- 8
- 16
firstpage: 114
lastpage: 128
published: 2012-08-16 00:00:00 +0000
- title: 'Learning Tree Adjoining Grammars from Structures and Strings'
abstract: 'We investigate the learnability of certain subclasses of tree adjoining grammars (TAGs). TAGs are based on two tree-tree operations, and generate structures known as \emph{derived trees}. The corresponding strings form a \emph{mildly context-sensitive} language. We prove that even very constrained subclasses of TAGs are not learnable from structures (derived trees) or strings, demonstrating that this type of problem is far from trivial. We also demonstrate that a large (parameterized) family of classes of TAGs is learnable from strings.'
volume: 21
URL: http://proceedings.mlr.press/v21/florencio12a.html
PDF: http://proceedings.mlr.press/v21/florencio12a/florencio12a.pdf
edit: https://github.com/mlresearch/v21/edit/gh-pages/_posts/2012-08-16-florencio12a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eleventh International Conference on Grammatical Inference'
publisher: 'PMLR'
author:
- family: Costa Florêncio
given: Christophe
editor:
- family: Heinz
given: Jeffrey
- family: Higuera
given: Colin
- family: Oates
given: Tim
address: University of Maryland, College Park, MD, USA
page: 129-132
id: florencio12a
issued:
date-parts:
- 2012
- 8
- 16
firstpage: 129
lastpage: 132
published: 2012-08-16 00:00:00 +0000
- title: 'Improving Model Inference of Black Box Components having Large Input Test Set'
abstract: 'The deterministic finite automata (DFA) learning algorithm $L*$ has been extended to learn Mealy machine models which are more succinct for \emph{input/output} (i/o) based systems. We propose an optimized learning algorithm $L_1$ to infer Mealy models of software black box components. The $L_1$ algorithm uses a modified observation table and avoids adding unnecessary elements to its columns and rows. The proposed improvements reduce the worst case time complexity. The $L_1$ algorithm is compared with the existing Mealy inference algorithms and the experiments conducted on a comprehensive set confirm the gain.'
volume: 21
URL: http://proceedings.mlr.press/v21/irfan12a.html
PDF: http://proceedings.mlr.press/v21/irfan12a/irfan12a.pdf
edit: https://github.com/mlresearch/v21/edit/gh-pages/_posts/2012-08-16-irfan12a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eleventh International Conference on Grammatical Inference'
publisher: 'PMLR'
author:
- family: Irfan
given: Muhammad Naeem
- family: Groz
given: Roland
- family: Oriat
given: Catherine
editor:
- family: Heinz
given: Jeffrey
- family: Higuera
given: Colin
- family: Oates
given: Tim
address: University of Maryland, College Park, MD, USA
page: 133-138
id: irfan12a
issued:
date-parts:
- 2012
- 8
- 16
firstpage: 133
lastpage: 138
published: 2012-08-16 00:00:00 +0000
- title: 'Learning of Bi-ω Languages from Factors'
abstract: 'Higuera and Janodet (2001) gave a polynomial algorithm that identifies the class of safe ω-languages which is a subclass of deterministic ω-languages from positive and negative prefixes. As an extension of this work we study the learning of the family of bi-ω languages.'
volume: 21
URL: http://proceedings.mlr.press/v21/jayasri12a.html
PDF: http://proceedings.mlr.press/v21/jayasri12a/jayasri12a.pdf
edit: https://github.com/mlresearch/v21/edit/gh-pages/_posts/2012-08-16-jayasri12a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eleventh International Conference on Grammatical Inference'
publisher: 'PMLR'
author:
- family: Jayasrirani
given: M.
- family: Begam
given: M. H.
- family: D.G
given: D. G. Thomas
- family: Emerald
given: J. D.
editor:
- family: Heinz
given: Jeffrey
- family: Higuera
given: Colin
- family: Oates
given: Tim
address: University of Maryland, College Park, MD, USA
page: 139-144
id: jayasri12a
issued:
date-parts:
- 2012
- 8
- 16
firstpage: 139
lastpage: 144
published: 2012-08-16 00:00:00 +0000
- title: 'Inducing Partially Observable Markov Decision Processes'
abstract: 'The partially observable Markov decision process (POMDP) model plays an important role in the field of reinforcement learning. It captures the problem of decision making when some important features of the environment are not visible to the decision maker. A number of approaches have been proposed for inducing POMDP models from data, a problem that has important parallels with grammar induction.'
volume: 21
URL: http://proceedings.mlr.press/v21/littman12a.html
PDF: http://proceedings.mlr.press/v21/littman12a/littman12a.pdf
edit: https://github.com/mlresearch/v21/edit/gh-pages/_posts/2012-08-16-littman12a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eleventh International Conference on Grammatical Inference'
publisher: 'PMLR'
author:
- family: Littman
given: Michael L.
editor:
- family: Heinz
given: Jeffrey
- family: Higuera
given: Colin
- family: Oates
given: Tim
address: University of Maryland, College Park, MD, USA
page: 145-148
id: littman12a
issued:
date-parts:
- 2012
- 8
- 16
firstpage: 145
lastpage: 148
published: 2012-08-16 00:00:00 +0000
- title: 'Inference of k-Testable Directed Acyclic Graph Languages'
abstract: 'In this paper, we tackle the task of graph language learning. We first extend the well-known classes of \emph{$k$-testability} and \emph{$k$-testability in the strict sense} languages to directed graph languages. Second, we propose a graph automata model for directed acyclic graph languages. This graph automata model is used to propose a grammatical inference algorithm to learn the class of directed acyclic $k$-testable in the strict sense graph languages. The algorithm runs in polynomial time and identifies this class of languages from positive data.'
volume: 21
URL: http://proceedings.mlr.press/v21/lopez12a.html
PDF: http://proceedings.mlr.press/v21/lopez12a/lopez12a.pdf
edit: https://github.com/mlresearch/v21/edit/gh-pages/_posts/2012-08-16-lopez12a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eleventh International Conference on Grammatical Inference'
publisher: 'PMLR'
author:
- family: Lòpez
given: Damiàn
- family: Calera-Rubio
given: Jorge
- family: Gallego-Sànchez
given: Javier
editor:
- family: Heinz
given: Jeffrey
- family: Higuera
given: Colin
- family: Oates
given: Tim
address: University of Maryland, College Park, MD, USA
page: 149-163
id: lopez12a
issued:
date-parts:
- 2012
- 8
- 16
firstpage: 149
lastpage: 163
published: 2012-08-16 00:00:00 +0000
- title: 'A Lattice of Sets of Alignments Built on the Common Subwords in a Finite Language'
abstract: 'We define the locally maximal subwords and locally minimal superwords common to a finite set of words. We also define the corresponding sets of alignments. We give a partial order relation between such sets of alignments, as well as two operations between them. We show that the constructed family of sets of alignments has the lattice structure. We give hints to use this structure as a machine learning basis for inducing a generalization of the set of words.'
volume: 21
URL: http://proceedings.mlr.press/v21/miclet12a.html
PDF: http://proceedings.mlr.press/v21/miclet12a/miclet12a.pdf
edit: https://github.com/mlresearch/v21/edit/gh-pages/_posts/2012-08-16-miclet12a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eleventh International Conference on Grammatical Inference'
publisher: 'PMLR'
author:
- family: Miclet
given: Laurent
- family: Barbot
given: Nelly
- family: Jeudy
given: Baptiste
editor:
- family: Heinz
given: Jeffrey
- family: Higuera
given: Colin
- family: Oates
given: Tim
address: University of Maryland, College Park, MD, USA
page: 164-176
id: miclet12a
issued:
date-parts:
- 2012
- 8
- 16
firstpage: 164
lastpage: 176
published: 2012-08-16 00:00:00 +0000
- title: 'Estimation of Generating Processes of Strings Represented with Patterns and Refinements'
abstract: 'We formalize generating processes of strings based on patterns and substitutions, and give an algorithm to estimate a probability mass function on substitutions, which is an element of processes. Patterns are non-empty sequences of characters and variables. Variables indicate unknown substrings and are replaced with other patterns by substitutions. By introducing variables and substitutions, we can deal with the difficulty of preparing production rules in generative grammar and of representing context-sensitivity with them. Our key idea is to regard sequences of substitutions as generations of strings, and to give probabilities of substitutions like PCFG. In this study, after giving a problem to estimate a probability mass function from strings based on our formalization, we solve it by the Passing algorithm that runs in an iterative manner. Our experimental results with synthetic strings show that our method estimates probability mass functions with sufficient small errors.'
volume: 21
URL: http://proceedings.mlr.press/v21/otaki12a.html
PDF: http://proceedings.mlr.press/v21/otaki12a/otaki12a.pdf
edit: https://github.com/mlresearch/v21/edit/gh-pages/_posts/2012-08-16-otaki12a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eleventh International Conference on Grammatical Inference'
publisher: 'PMLR'
author:
- family: Otaki
given: Keisuke
- family: Yamamoto
given: Akihiro
editor:
- family: Heinz
given: Jeffrey
- family: Higuera
given: Colin
- family: Oates
given: Tim
address: University of Maryland, College Park, MD, USA
page: 177-182
id: otaki12a
issued:
date-parts:
- 2012
- 8
- 16
firstpage: 177
lastpage: 182
published: 2012-08-16 00:00:00 +0000
- title: 'Applying Grammar Inference To Identify Generalized Patterns of Facial Expressions of Pain'
abstract: 'We present an application of grammar inference in the domain of facial expression analysis. Typically, only *sets* of AUs which occur in a given time frame are used for expression analysis, the *sequence* in which these AUs occur is ignored. We wanted to explore whether the strucural patterns of AU appearances contain diagnostically relevant information. We applied alignment-based learning (ABL) to data of facial expressions of pain collected in a psychological study. To evaluate the quality of the induced grammars we applied cross-validation to estimate the generalization error. We can show that the learned grammars have reasonably high coverages for unseen pain sequences. However, the number of rules of the learned grammars cannot be reduced substantially without loss of generalization.'
volume: 21
URL: http://proceedings.mlr.press/v21/schmid12a.html
PDF: http://proceedings.mlr.press/v21/schmid12a/schmid12a.pdf
edit: https://github.com/mlresearch/v21/edit/gh-pages/_posts/2012-08-16-schmid12a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eleventh International Conference on Grammatical Inference'
publisher: 'PMLR'
author:
- family: Schmid
given: Ute
- family: Siebers
given: Michael
- family: Seuß
given: Dominik
- family: Kunz
given: Miriam
- family: Lautenbacher
given: Stefan
editor:
- family: Heinz
given: Jeffrey
- family: Higuera
given: Colin
- family: Oates
given: Tim
address: University of Maryland, College Park, MD, USA
page: 183-188
id: schmid12a
issued:
date-parts:
- 2012
- 8
- 16
firstpage: 183
lastpage: 188
published: 2012-08-16 00:00:00 +0000
- title: 'Bootstrapping Dependency Grammar Inducers from Incomplete Sentence Fragments via Austere Models'
abstract: 'Modern grammar induction systems often employ curriculum learning strategies that begin by training on a subset of all available input that is considered simpler than the full data. Traditionally, filtering has been at granularities of whole input units, e.g., discarding entire sentences with too many words or punctuation marks. We propose instead viewing inter-punctuation fragments as atoms, initially, thus making some simple phrases and clauses of complex sentences available to training sooner. Splitting input text at punctuation in this way improved our state-of-the-art grammar induction pipeline. We observe that resulting partial data, i.e., mostly incomplete sentence fragments, can be analyzed using reduced parsing models which, we show, can be easier to bootstrap than more nuanced grammars. Starting with a new, bare dependency-and-boundary model (DBM-0), our grammar inducer attained 61.2% directed dependency accuracy on Section 23 (all sentences) of the Wall Street Journal corpus: more than 2% higher than previous published results for this task.'
volume: 21
URL: http://proceedings.mlr.press/v21/spitkovsky12a.html
PDF: http://proceedings.mlr.press/v21/spitkovsky12a/spitkovsky12a.pdf
edit: https://github.com/mlresearch/v21/edit/gh-pages/_posts/2012-08-16-spitkovsky12a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eleventh International Conference on Grammatical Inference'
publisher: 'PMLR'
author:
- family: Spitkovsky
given: Valentin I.
- family: Alshawi
given: Hiyan
- family: Jurafsky
given: Daniel
editor:
- family: Heinz
given: Jeffrey
- family: Higuera
given: Colin
- family: Oates
given: Tim
address: University of Maryland, College Park, MD, USA
page: 189-194
id: spitkovsky12a
issued:
date-parts:
- 2012
- 8
- 16
firstpage: 189
lastpage: 194
published: 2012-08-16 00:00:00 +0000
- title: 'Active Automata Learning: From DFAs to Interface Programs and Beyond'
abstract: 'This paper reviews the development of active learning in the last decade under the perspective of treating of data, a major source of undecidability, and therefore a key problem to achieve practicality. Starting with the first case studies, in which data was completely disregarded, we revisit different steps towards dealing with data explicitly in active learning: We discuss Mealy Machines as a model for systems with (data) output, automated alphabet abstraction refinement as a two-dimensional extension of the partition-refinement based approach of active learning for inferring not only states but also optimal alphabet abstractions, and Register Mealy Machines, which can be regarded as programs restricted to data-independent data processing as it is typical for protocols or interface programs. We are convinced that this development has the potential to transform active automata learning into a technology of high practical importance.'
volume: 21
URL: http://proceedings.mlr.press/v21/steffen12a.html
PDF: http://proceedings.mlr.press/v21/steffen12a/steffen12a.pdf
edit: https://github.com/mlresearch/v21/edit/gh-pages/_posts/2012-08-16-steffen12a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eleventh International Conference on Grammatical Inference'
publisher: 'PMLR'
author:
- family: Steffen
given: Bernhard
- family: Howar
given: Falk
- family: Isberner
given: Malte
editor:
- family: Heinz
given: Jeffrey
- family: Higuera
given: Colin
- family: Oates
given: Tim
address: University of Maryland, College Park, MD, USA
page: 195-209
id: steffen12a
issued:
date-parts:
- 2012
- 8
- 16
firstpage: 195
lastpage: 209
published: 2012-08-16 00:00:00 +0000
- title: 'Fuzzy Grammar-based Prediction of Amyloidogenic Regions'
abstract: 'In this paper, we address the problem of predicting the location of amyloidogenic regions in proteins. The language of protein sequence can be described by using a formal system such as fuzzy context-free grammar, and the problem of amyloidogenic region recognition can be replaced by fuzzy grammar induction. The induced fuzzy grammar achieved 70.6% accuracy and 96.7% specificity on a recently published amyloidogenic dataset. Our results are comparable to other methods dedicated to recognize amyloid proteins.'
volume: 21
URL: http://proceedings.mlr.press/v21/unold12a.html
PDF: http://proceedings.mlr.press/v21/unold12a/unold12a.pdf
edit: https://github.com/mlresearch/v21/edit/gh-pages/_posts/2012-08-16-unold12a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eleventh International Conference on Grammatical Inference'
publisher: 'PMLR'
author:
- family: Unold
given: Olgierd
editor:
- family: Heinz
given: Jeffrey
- family: Higuera
given: Colin
- family: Oates
given: Tim
address: University of Maryland, College Park, MD, USA
page: 210-219
id: unold12a
issued:
date-parts:
- 2012
- 8
- 16
firstpage: 210
lastpage: 219
published: 2012-08-16 00:00:00 +0000
- title: 'Learning Interpretations Using Sequence Classification'
abstract: 'In this paper we present a system that assigns interpretations, in the form of shallow semantic frame descriptions, to natural language sentences. The system searches for relevant patterns, consisting of words from the sentences, to identify the correct semantic frame and associated slot values. For each of these choices, a separate classifier is trained. Each classifier learns the boundaries between different languages, which each correspond to a particular class. The different classifiers each have their own viewpoint on the data depending on which aspect needs to be identified.'
volume: 21
URL: http://proceedings.mlr.press/v21/vanzaanen12a.html
PDF: http://proceedings.mlr.press/v21/vanzaanen12a/vanzaanen12a.pdf
edit: https://github.com/mlresearch/v21/edit/gh-pages/_posts/2012-08-16-vanzaanen12a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eleventh International Conference on Grammatical Inference'
publisher: 'PMLR'
author:
- family: Zaanen
given: Menno
- family: Loo
given: Janneke
editor:
- family: Heinz
given: Jeffrey
- family: Higuera
given: Colin
- family: Oates
given: Tim
address: University of Maryland, College Park, MD, USA
page: 220-223
id: vanzaanen12a
issued:
date-parts:
- 2012
- 8
- 16
firstpage: 220
lastpage: 223
published: 2012-08-16 00:00:00 +0000
- title: 'Model Merging Versus Model Splitting Context-Free Grammar Induction'
abstract: 'When comparing different grammatical inference algorithms, it becomes evident that generic techniques have been used in different systems. Several finite-state learning algorithms use state-merging as their underlying technique and a collection of grammatical inference algorithms that aim to learn context-free grammars build on the concept of substitutability to identify potential grammar rules. When learning context-free grammars, there are essentially two approaches: model merging, which generalizes with more data, and model splitting, which specializes with more data. Both approaches can be combined sequentially in a generic framework. In this article, we investigate the impact of different approaches within the first phase of the framework on system performance.'
volume: 21
URL: http://proceedings.mlr.press/v21/vanzaanen12b.html
PDF: http://proceedings.mlr.press/v21/vanzaanen12b/vanzaanen12b.pdf
edit: https://github.com/mlresearch/v21/edit/gh-pages/_posts/2012-08-16-vanzaanen12b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eleventh International Conference on Grammatical Inference'
publisher: 'PMLR'
author:
- family: Zaanen
given: Menno
- family: Noord
given: Nanne
editor:
- family: Heinz
given: Jeffrey
- family: Higuera
given: Colin
- family: Oates
given: Tim
address: University of Maryland, College Park, MD, USA
page: 224-236
id: vanzaanen12b
issued:
date-parts:
- 2012
- 8
- 16
firstpage: 224
lastpage: 236
published: 2012-08-16 00:00:00 +0000
- title: 'Induction of Non-Deterministic Finite Automata on Supercomputers'
abstract: 'The problem of inducing automata of minimal size consistent with finite sets of examples and counter-examples is the combinatorial optimization task of grammatical inference and is known to be computationally hard. Both an exact and a heuristic method of finding a non-deterministic finite automaton (NFA) with a given number of states such that all examples are accepted and all counter-examples are rejected by the automaton will be evaluated. The methods are based on a translation of NFA identification into integer nonlinear programming.'
volume: 21
URL: http://proceedings.mlr.press/v21/wieczorek12a.html
PDF: http://proceedings.mlr.press/v21/wieczorek12a/wieczorek12a.pdf
edit: https://github.com/mlresearch/v21/edit/gh-pages/_posts/2012-08-16-wieczorek12a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eleventh International Conference on Grammatical Inference'
publisher: 'PMLR'
author:
- family: Wieczorek
given: Wojciech
editor:
- family: Heinz
given: Jeffrey
- family: Higuera
given: Colin
- family: Oates
given: Tim
address: University of Maryland, College Park, MD, USA
page: 237-242
id: wieczorek12a
issued:
date-parts:
- 2012
- 8
- 16
firstpage: 237
lastpage: 242
published: 2012-08-16 00:00:00 +0000
- title: 'Results of the PAutomaC Probabilistic Automaton Learning Competition'
abstract: 'Approximating distributions over strings is a hard learning problem. Typical GI techniques involve using finite state machines as models and attempting to learn both the structure and the weights, simultaneously. The PAutomaC competition is the first challenge to allow comparison between methods and algorithms and builds a first state of the art for these techniques. Both artificial data and real data were proposed and contestants were to try to estimate the probabilities of test strings. The purpose of this paper is to provide an overview of the implementation details of PAutomaC and to report the final results of the competition.'
volume: 21
URL: http://proceedings.mlr.press/v21/verwer12a.html
PDF: http://proceedings.mlr.press/v21/verwer12a/verwer12a.pdf
edit: https://github.com/mlresearch/v21/edit/gh-pages/_posts/2012-08-16-verwer12a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eleventh International Conference on Grammatical Inference'
publisher: 'PMLR'
author:
- family: Verwer
given: Sicco
- family: Eyraud
given: Rémi
- family: Higuera
given: Colin
editor:
- family: Heinz
given: Jeffrey
- family: Higuera
given: Colin
- family: Oates
given: Tim
address: University of Maryland, College Park, MD, USA
page: 243-248
id: verwer12a
issued:
date-parts:
- 2012
- 8
- 16
firstpage: 243
lastpage: 248
published: 2012-08-16 00:00:00 +0000
- title: 'Treba: Efficient Numerically Stable EM for PFA'
abstract: 'Training probabilistic finite automata with the EM/Baum-Welch algorithm is computationally very intensive, especially if random ergodic automata are used initially, and additional strategies such as deterministic annealing are used. In this paper we present some optimization and parallelization strategies to the Baum-Welch algorithm that often allow for training of much larger automata with a larger number of observations. The tool, \emphtreba, which implements the optimizations, is available open-source and its results were used to participate in the PAutomaC PFA/HMM competition.'
volume: 21
URL: http://proceedings.mlr.press/v21/hulden12a.html
PDF: http://proceedings.mlr.press/v21/hulden12a/hulden12a.pdf
edit: https://github.com/mlresearch/v21/edit/gh-pages/_posts/2012-08-16-hulden12a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eleventh International Conference on Grammatical Inference'
publisher: 'PMLR'
author:
- family: Hulden
given: Mans
editor:
- family: Heinz
given: Jeffrey
- family: Higuera
given: Colin
- family: Oates
given: Tim
address: University of Maryland, College Park, MD, USA
page: 249-253
id: hulden12a
issued:
date-parts:
- 2012
- 8
- 16
firstpage: 249
lastpage: 253
published: 2012-08-16 00:00:00 +0000
- title: 'Simple Variable Length N-grams for Probabilistic Automata Learning'
abstract: 'This paper describes an approach used in the 2012 Probabilistic Automata Learning Competition. The main goal of the competition was to obtain insights about which techniques and approaches work best for sequence learning based on different kinds of automata generating machines. This paper proposes the usage of n-gram models with variable length. Experiments show that, using the test sets provided by the competition, the variable-length approach works better than fixed 3-grams.'
volume: 21
URL: http://proceedings.mlr.press/v21/kepler12a.html
PDF: http://proceedings.mlr.press/v21/kepler12a/kepler12a.pdf
edit: https://github.com/mlresearch/v21/edit/gh-pages/_posts/2012-08-16-kepler12a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eleventh International Conference on Grammatical Inference'
publisher: 'PMLR'
author:
- family: Kepler
given: Fabio N.
- family: Mergen
given: Sergio L. S.
- family: Billa
given: Cleo Z.
editor:
- family: Heinz
given: Jeffrey
- family: Higuera
given: Colin
- family: Oates
given: Tim
address: University of Maryland, College Park, MD, USA
page: 254-258
id: kepler12a
issued:
date-parts:
- 2012
- 8
- 16
firstpage: 254
lastpage: 258
published: 2012-08-16 00:00:00 +0000
- title: 'Marginalizing Out Transition Probabilities for Several Subclasses of PFAs'
abstract: 'A Bayesian manner which marginalizes transition probabilities can be generally applied to various kinds of probabilistic finite state machine models. Based on such a Bayesian manner, we implemented and compared three algorithms: variable-length gram, state merging method for PDFAs, and collapsed Gibbs sampling for PFAs. Among those, collapsed Gibbs sampling for PFAs performed the best on the data from the pre-competition stage of PAutomaC, although it consumes large computation resources.'
volume: 21
URL: http://proceedings.mlr.press/v21/shibata12a.html
PDF: http://proceedings.mlr.press/v21/shibata12a/shibata12a.pdf
edit: https://github.com/mlresearch/v21/edit/gh-pages/_posts/2012-08-16-shibata12a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eleventh International Conference on Grammatical Inference'
publisher: 'PMLR'
author:
- family: Shibata
given: Chihiro
- family: Yoshinaka
given: Ryo
editor:
- family: Heinz
given: Jeffrey
- family: Higuera
given: Colin
- family: Oates
given: Tim
address: University of Maryland, College Park, MD, USA
page: 259-263
id: shibata12a
issued:
date-parts:
- 2012
- 8
- 16
firstpage: 259
lastpage: 263
published: 2012-08-16 00:00:00 +0000