On the Statistical Query Complexity of Learning Semiautomata: a Random Walk Approach

George Giapitzakis, Kimon Fountoulakis, Eshaan Nichani, Jason D. Lee
Proceedings of Thirty Ninth Conference on Learning Theory, PMLR 336:2639-2678, 2026.

Abstract

Semiautomata form a rich class of sequence-processing algorithms with applications in natural language processing, robotics, computational biology, and data mining. We establish the first Statistical Query hardness result for semiautomata under the uniform distribution over input words and initial states. We show that Statistical Query hardness can be established when both the alphabet size and input length are polynomial in the number of states. Unlike the case of deterministic finite automata, where hardness typically arises through the hardness of the language they recognize (e.g., parity), our result is derived solely from the internal state-transition structure of semiautomata. Our analysis reduces the task of distinguishing the final states of two semiautomata to studying the behavior of a random walk on the group $S_{N} \times S_{N}$. By applying tools from Fourier analysis and the representation theory of the symmetric group, we obtain tight spectral gap bounds, demonstrating that after a polynomial number of steps in the number of states, distinct semiautomata become nearly uncorrelated, yielding the desired hardness result.

Cite this Paper


BibTeX
@InProceedings{pmlr-v336-giapitzakis26a, title = {On the Statistical Query Complexity of Learning Semiautomata: a Random Walk Approach}, author = {Giapitzakis, George and Fountoulakis, Kimon and Nichani, Eshaan and Lee, Jason D.}, booktitle = {Proceedings of Thirty Ninth Conference on Learning Theory}, pages = {2639--2678}, year = {2026}, editor = {Hanneke, Steve and Lattimore, Tor}, volume = {336}, series = {Proceedings of Machine Learning Research}, month = {29 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v336/main/assets/giapitzakis26a/giapitzakis26a.pdf}, url = {https://proceedings.mlr.press/v336/giapitzakis26a.html}, abstract = {Semiautomata form a rich class of sequence-processing algorithms with applications in natural language processing, robotics, computational biology, and data mining. We establish the first Statistical Query hardness result for semiautomata under the uniform distribution over input words and initial states. We show that Statistical Query hardness can be established when both the alphabet size and input length are polynomial in the number of states. Unlike the case of deterministic finite automata, where hardness typically arises through the hardness of the language they recognize (e.g., parity), our result is derived solely from the internal state-transition structure of semiautomata. Our analysis reduces the task of distinguishing the final states of two semiautomata to studying the behavior of a random walk on the group $S_{N} \times S_{N}$. By applying tools from Fourier analysis and the representation theory of the symmetric group, we obtain tight spectral gap bounds, demonstrating that after a polynomial number of steps in the number of states, distinct semiautomata become nearly uncorrelated, yielding the desired hardness result.} }
Endnote
%0 Conference Paper %T On the Statistical Query Complexity of Learning Semiautomata: a Random Walk Approach %A George Giapitzakis %A Kimon Fountoulakis %A Eshaan Nichani %A Jason D. Lee %B Proceedings of Thirty Ninth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2026 %E Steve Hanneke %E Tor Lattimore %F pmlr-v336-giapitzakis26a %I PMLR %P 2639--2678 %U https://proceedings.mlr.press/v336/giapitzakis26a.html %V 336 %X Semiautomata form a rich class of sequence-processing algorithms with applications in natural language processing, robotics, computational biology, and data mining. We establish the first Statistical Query hardness result for semiautomata under the uniform distribution over input words and initial states. We show that Statistical Query hardness can be established when both the alphabet size and input length are polynomial in the number of states. Unlike the case of deterministic finite automata, where hardness typically arises through the hardness of the language they recognize (e.g., parity), our result is derived solely from the internal state-transition structure of semiautomata. Our analysis reduces the task of distinguishing the final states of two semiautomata to studying the behavior of a random walk on the group $S_{N} \times S_{N}$. By applying tools from Fourier analysis and the representation theory of the symmetric group, we obtain tight spectral gap bounds, demonstrating that after a polynomial number of steps in the number of states, distinct semiautomata become nearly uncorrelated, yielding the desired hardness result.
APA
Giapitzakis, G., Fountoulakis, K., Nichani, E. & Lee, J.D.. (2026). On the Statistical Query Complexity of Learning Semiautomata: a Random Walk Approach. Proceedings of Thirty Ninth Conference on Learning Theory, in Proceedings of Machine Learning Research 336:2639-2678 Available from https://proceedings.mlr.press/v336/giapitzakis26a.html.

Related Material