Pareto-optimal Non-uniform Language Generation

Moses Charikar, Chirag Pabbaraju
Proceedings of The 37th International Conference on Algorithmic Learning Theory, PMLR 313:1-27, 2026.

Abstract

Kleinberg and Mullainathan (2024) recently proposed an interesting model for language generation in the limit: Given a countable collection of languages, and an adversary enumerating the strings of some language $L$ from the collection, the objective is to generate _new_ strings from the target language, such that all strings generated beyond some finite time are valid. Li, Raman, and Tewari (2024) and Charikar and Pabbaraju (2024) showed strong _non-uniform_ generation guarantees in this model, giving algorithms that generate new valid strings from $L$ after seeing a number of distinct input strings $t(L)$ that depends only on $L$ (and the collection), but _not_ the enumeration order. However, for both these works, the language-wise _generation times_ $t(L)$ of the algorithm can be strictly sub-optimal. In this work, we study _Pareto-optimality_ of non-uniform language generation in the limit. We propose an algorithm, whose generation times $t^\star(L)$ are (almost) Pareto-optimal: any other algorithm whose generation time for some language $L$ is strictly smaller than $t^\star(L)$, _must satisfy_ that its generation time for some _other_ language $L’$ is strictly worse than $t^\star(L’)$. Pareto-optimality is essentially the best that one can achieve for non-uniform generation. Our algorithmic framework conveniently adapts to further give Pareto-optimal non-uniform generation algorithms in the practically motivated settings of _noisy_ as well as _representative_ generation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v313-charikar26a, title = {Pareto-optimal Non-uniform Language Generation}, author = {Charikar, Moses and Pabbaraju, Chirag}, booktitle = {Proceedings of The 37th International Conference on Algorithmic Learning Theory}, pages = {1--27}, year = {2026}, editor = {Telgarsky, Matus and Ullman, Jonathan}, volume = {313}, series = {Proceedings of Machine Learning Research}, month = {23--26 Feb}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v313/main/assets/charikar26a/charikar26a.pdf}, url = {https://proceedings.mlr.press/v313/charikar26a.html}, abstract = {Kleinberg and Mullainathan (2024) recently proposed an interesting model for language generation in the limit: Given a countable collection of languages, and an adversary enumerating the strings of some language $L$ from the collection, the objective is to generate _new_ strings from the target language, such that all strings generated beyond some finite time are valid. Li, Raman, and Tewari (2024) and Charikar and Pabbaraju (2024) showed strong _non-uniform_ generation guarantees in this model, giving algorithms that generate new valid strings from $L$ after seeing a number of distinct input strings $t(L)$ that depends only on $L$ (and the collection), but _not_ the enumeration order. However, for both these works, the language-wise _generation times_ $t(L)$ of the algorithm can be strictly sub-optimal. In this work, we study _Pareto-optimality_ of non-uniform language generation in the limit. We propose an algorithm, whose generation times $t^\star(L)$ are (almost) Pareto-optimal: any other algorithm whose generation time for some language $L$ is strictly smaller than $t^\star(L)$, _must satisfy_ that its generation time for some _other_ language $L’$ is strictly worse than $t^\star(L’)$. Pareto-optimality is essentially the best that one can achieve for non-uniform generation. Our algorithmic framework conveniently adapts to further give Pareto-optimal non-uniform generation algorithms in the practically motivated settings of _noisy_ as well as _representative_ generation.} }
Endnote
%0 Conference Paper %T Pareto-optimal Non-uniform Language Generation %A Moses Charikar %A Chirag Pabbaraju %B Proceedings of The 37th International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2026 %E Matus Telgarsky %E Jonathan Ullman %F pmlr-v313-charikar26a %I PMLR %P 1--27 %U https://proceedings.mlr.press/v313/charikar26a.html %V 313 %X Kleinberg and Mullainathan (2024) recently proposed an interesting model for language generation in the limit: Given a countable collection of languages, and an adversary enumerating the strings of some language $L$ from the collection, the objective is to generate _new_ strings from the target language, such that all strings generated beyond some finite time are valid. Li, Raman, and Tewari (2024) and Charikar and Pabbaraju (2024) showed strong _non-uniform_ generation guarantees in this model, giving algorithms that generate new valid strings from $L$ after seeing a number of distinct input strings $t(L)$ that depends only on $L$ (and the collection), but _not_ the enumeration order. However, for both these works, the language-wise _generation times_ $t(L)$ of the algorithm can be strictly sub-optimal. In this work, we study _Pareto-optimality_ of non-uniform language generation in the limit. We propose an algorithm, whose generation times $t^\star(L)$ are (almost) Pareto-optimal: any other algorithm whose generation time for some language $L$ is strictly smaller than $t^\star(L)$, _must satisfy_ that its generation time for some _other_ language $L’$ is strictly worse than $t^\star(L’)$. Pareto-optimality is essentially the best that one can achieve for non-uniform generation. Our algorithmic framework conveniently adapts to further give Pareto-optimal non-uniform generation algorithms in the practically motivated settings of _noisy_ as well as _representative_ generation.
APA
Charikar, M. & Pabbaraju, C.. (2026). Pareto-optimal Non-uniform Language Generation. Proceedings of The 37th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 313:1-27 Available from https://proceedings.mlr.press/v313/charikar26a.html.

Related Material