Pareto-optimal Non-uniform Language Generation

Moses Charikar; Chirag Pabbaraju

Pareto-optimal Non-uniform Language Generation

Moses Charikar, Chirag Pabbaraju

Proceedings of The 37th International Conference on Algorithmic Learning Theory, PMLR 313:1-27, 2026.

Abstract

Kleinberg and Mullainathan (2024) recently proposed an interesting model for language generation in the limit: Given a countable collection of languages, and an adversary enumerating the strings of some language $L$ from the collection, the objective is to generate _new_ strings from the target language, such that all strings generated beyond some finite time are valid. Li, Raman, and Tewari (2024) and Charikar and Pabbaraju (2024) showed strong _non-uniform_ generation guarantees in this model, giving algorithms that generate new valid strings from $L$ after seeing a number of distinct input strings $t(L)$ that depends only on $L$ (and the collection), but _not_ the enumeration order. However, for both these works, the language-wise _generation times_ $t(L)$ of the algorithm can be strictly sub-optimal. In this work, we study _Pareto-optimality_ of non-uniform language generation in the limit. We propose an algorithm, whose generation times $t^\star(L)$ are (almost) Pareto-optimal: any other algorithm whose generation time for some language $L$ is strictly smaller than $t^\star(L)$, _must satisfy_ that its generation time for some _other_ language $L’$ is strictly worse than $t^\star(L’)$. Pareto-optimality is essentially the best that one can achieve for non-uniform generation. Our algorithmic framework conveniently adapts to further give Pareto-optimal non-uniform generation algorithms in the practically motivated settings of _noisy_ as well as _representative_ generation.

Cite this Paper

BibTeX

@InProceedings{pmlr-v313-charikar26a,
  title = 	 {Pareto-optimal Non-uniform Language Generation},
  author =       {Charikar, Moses and Pabbaraju, Chirag},
  booktitle = 	 {Proceedings of The 37th International Conference on Algorithmic Learning Theory},
  pages = 	 {1--27},
  year = 	 {2026},
  editor = 	 {Telgarsky, Matus and Ullman, Jonathan},
  volume = 	 {313},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--26 Feb},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v313/main/assets/charikar26a/charikar26a.pdf},
  url = 	 {https://proceedings.mlr.press/v313/charikar26a.html},
  abstract = 	 {Kleinberg and Mullainathan (2024) recently proposed an interesting model for language generation in the limit: Given a countable collection of languages, and an adversary enumerating the strings of some language $L$ from the collection, the objective is to generate _new_ strings from the target language, such that all strings generated beyond some finite time are valid. Li, Raman, and Tewari (2024) and Charikar and Pabbaraju (2024) showed strong _non-uniform_ generation guarantees in this model, giving algorithms that generate new valid strings from $L$ after seeing a number of distinct input strings $t(L)$ that depends only on $L$ (and the collection), but _not_ the enumeration order. However, for both these works, the language-wise _generation times_ $t(L)$ of the algorithm can be strictly sub-optimal. In this work, we study _Pareto-optimality_ of non-uniform language generation in the limit. We propose an algorithm, whose generation times $t^\star(L)$ are (almost) Pareto-optimal: any other algorithm whose generation time for some language $L$ is strictly smaller than $t^\star(L)$, _must satisfy_ that its generation time for some _other_ language $L’$ is strictly worse than $t^\star(L’)$. Pareto-optimality is essentially the best that one can achieve for non-uniform generation. Our algorithmic framework conveniently adapts to further give Pareto-optimal non-uniform generation algorithms in the practically motivated settings of _noisy_ as well as _representative_ generation.}
}

Endnote

%0 Conference Paper
%T Pareto-optimal Non-uniform Language Generation
%A Moses Charikar
%A Chirag Pabbaraju
%B Proceedings of The 37th International Conference on Algorithmic Learning Theory
%C Proceedings of Machine Learning Research
%D 2026
%E Matus Telgarsky
%E Jonathan Ullman	
%F pmlr-v313-charikar26a
%I PMLR
%P 1--27
%U https://proceedings.mlr.press/v313/charikar26a.html
%V 313
%X Kleinberg and Mullainathan (2024) recently proposed an interesting model for language generation in the limit: Given a countable collection of languages, and an adversary enumerating the strings of some language $L$ from the collection, the objective is to generate _new_ strings from the target language, such that all strings generated beyond some finite time are valid. Li, Raman, and Tewari (2024) and Charikar and Pabbaraju (2024) showed strong _non-uniform_ generation guarantees in this model, giving algorithms that generate new valid strings from $L$ after seeing a number of distinct input strings $t(L)$ that depends only on $L$ (and the collection), but _not_ the enumeration order. However, for both these works, the language-wise _generation times_ $t(L)$ of the algorithm can be strictly sub-optimal. In this work, we study _Pareto-optimality_ of non-uniform language generation in the limit. We propose an algorithm, whose generation times $t^\star(L)$ are (almost) Pareto-optimal: any other algorithm whose generation time for some language $L$ is strictly smaller than $t^\star(L)$, _must satisfy_ that its generation time for some _other_ language $L’$ is strictly worse than $t^\star(L’)$. Pareto-optimality is essentially the best that one can achieve for non-uniform generation. Our algorithmic framework conveniently adapts to further give Pareto-optimal non-uniform generation algorithms in the practically motivated settings of _noisy_ as well as _representative_ generation.

APA

Charikar, M. & Pabbaraju, C.. (2026). Pareto-optimal Non-uniform Language Generation. Proceedings of The 37th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 313:1-27 Available from https://proceedings.mlr.press/v313/charikar26a.html.

Pareto-optimal Non-uniform Language Generation

Abstract

Cite this Paper

Related Material