Flexible and Efficient Grammar-Constrained Decoding

Kanghee Park; Timothy Zhou; Loris D’Antoni

Flexible and Efficient Grammar-Constrained Decoding

Kanghee Park, Timothy Zhou, Loris D’Antoni

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:48262-48275, 2025.

Abstract

Large Language Models (LLMs) are often asked to generate structured outputs that obey precise syntactic rules, such as code snippets or formatted data. Grammar-constrained decoding (GCD) can guarantee that LLM outputs matches such rules by masking out tokens that will provably lead to outputs that do not belong to a specified context-free grammar (CFG). To guarantee soundness, GCD algorithms have to compute how a given LLM subword tokenizer can “align” with the tokens used by a given context-free grammar and compute token masks based on this information. Doing so efficiently is challenging and existing GCD algorithms require tens of minutes to preprocess common grammars. We present a new GCD algorithm together with an implementation that offers 17.71x faster offline preprocessing than existing approaches while preserving state-of-the-art efficiency in online mask computation.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-park25l,
  title = 	 {Flexible and Efficient Grammar-Constrained Decoding},
  author =       {Park, Kanghee and Zhou, Timothy and D'Antoni, Loris},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {48262--48275},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/park25l/park25l.pdf},
  url = 	 {https://proceedings.mlr.press/v267/park25l.html},
  abstract = 	 {Large Language Models (LLMs) are often asked to generate structured outputs that obey precise syntactic rules, such as code snippets or formatted data. Grammar-constrained decoding (GCD) can guarantee that LLM outputs matches such rules by masking out tokens that will provably lead to outputs that do not belong to a specified context-free grammar (CFG). To guarantee soundness, GCD algorithms have to compute how a given LLM subword tokenizer can “align” with the tokens used by a given context-free grammar and compute token masks based on this information. Doing so efficiently is challenging and existing GCD algorithms require tens of minutes to preprocess common grammars. We present a new GCD algorithm together with an implementation that offers 17.71x faster offline preprocessing than existing approaches while preserving state-of-the-art efficiency in online mask computation.}
}

Endnote

%0 Conference Paper
%T Flexible and Efficient Grammar-Constrained Decoding
%A Kanghee Park
%A Timothy Zhou
%A Loris D’Antoni
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-park25l
%I PMLR
%P 48262--48275
%U https://proceedings.mlr.press/v267/park25l.html
%V 267
%X Large Language Models (LLMs) are often asked to generate structured outputs that obey precise syntactic rules, such as code snippets or formatted data. Grammar-constrained decoding (GCD) can guarantee that LLM outputs matches such rules by masking out tokens that will provably lead to outputs that do not belong to a specified context-free grammar (CFG). To guarantee soundness, GCD algorithms have to compute how a given LLM subword tokenizer can “align” with the tokens used by a given context-free grammar and compute token masks based on this information. Doing so efficiently is challenging and existing GCD algorithms require tens of minutes to preprocess common grammars. We present a new GCD algorithm together with an implementation that offers 17.71x faster offline preprocessing than existing approaches while preserving state-of-the-art efficiency in online mask computation.

APA

Park, K., Zhou, T. & D’Antoni, L.. (2025). Flexible and Efficient Grammar-Constrained Decoding. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:48262-48275 Available from https://proceedings.mlr.press/v267/park25l.html.

Flexible and Efficient Grammar-Constrained Decoding

Abstract

Cite this Paper

Related Material