String Extension Learning Despite Noisy Intrusions
Proceedings of 16th edition of the International Conference on Grammatical Inference, PMLR 217:80-95, 2023.
We examine the conditions in which string extension learning algorithms are able to identify classes of formal languages in the limit from noisy data presentations in polynomial time. A data presentation for a formal language $L$ is noisy if it contains words belonging to the complement of $L$. In the general case, string extensions learners cannot distinguish noise from true examples and are led astray. The main result is that relative frequencies can be used to distinguish noisy examples from true examples provided the data presentations are constrained to those in which relative frequencies are uniformly present and exceed the rate at which noise is introduced.