Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation

Luca Beurer-Kellner; Marc Fischer; Martin Vechev

Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation

Luca Beurer-Kellner, Marc Fischer, Martin Vechev

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:3658-3673, 2024.

Abstract

To ensure that text generated by large language models (LLMs) is in an expected format, constrained decoding methods propose to enforce strict formal language constraints during generation. However, as we show in this work, not only do such methods often incur performance overhead during generation, but many of them also significantly impair task accuracy, if they do not correctly align the underlying LLM sub-word vocabularies with external constraints. To address this, we present a novel decoding algorithm, DOMINO, that can enforce constraints in a fully subword-aligned fashion, while leveraging pre-computation and speculative decoding to achieve virtually no overhead and in some cases even almost 2

$\times$ speedup over unconstrained decoding – thereby outperforming existing approaches by a wide margin. We release DOMINO as open source at https://github.com/eth-sri/domino.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-beurer-kellner24a,
  title = 	 {Guiding {LLM}s The Right Way: Fast, Non-Invasive Constrained Generation},
  author =       {Beurer-Kellner, Luca and Fischer, Marc and Vechev, Martin},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {3658--3673},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/beurer-kellner24a/beurer-kellner24a.pdf},
  url = 	 {https://proceedings.mlr.press/v235/beurer-kellner24a.html},
  abstract = 	 {To ensure that text generated by large language models (LLMs) is in an expected format, constrained decoding methods propose to enforce strict formal language constraints during generation. However, as we show in this work, not only do such methods often incur performance overhead during generation, but many of them also significantly impair task accuracy, if they do not correctly align the underlying LLM sub-word vocabularies with external constraints. To address this, we present a novel decoding algorithm, DOMINO, that can enforce constraints in a fully subword-aligned fashion, while leveraging pre-computation and speculative decoding to achieve virtually no overhead and in some cases even almost 2$\times$ speedup over unconstrained decoding – thereby outperforming existing approaches by a wide margin. We release DOMINO as open source at https://github.com/eth-sri/domino.}
}

Endnote

%0 Conference Paper
%T Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation
%A Luca Beurer-Kellner
%A Marc Fischer
%A Martin Vechev
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-beurer-kellner24a
%I PMLR
%P 3658--3673
%U https://proceedings.mlr.press/v235/beurer-kellner24a.html
%V 235
%X To ensure that text generated by large language models (LLMs) is in an expected format, constrained decoding methods propose to enforce strict formal language constraints during generation. However, as we show in this work, not only do such methods often incur performance overhead during generation, but many of them also significantly impair task accuracy, if they do not correctly align the underlying LLM sub-word vocabularies with external constraints. To address this, we present a novel decoding algorithm, DOMINO, that can enforce constraints in a fully subword-aligned fashion, while leveraging pre-computation and speculative decoding to achieve virtually no overhead and in some cases even almost 2$\times$ speedup over unconstrained decoding – thereby outperforming existing approaches by a wide margin. We release DOMINO as open source at https://github.com/eth-sri/domino.

APA


Beurer-Kellner, L., Fischer, M. & Vechev, M.. (2024). Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:3658-3673 Available from https://proceedings.mlr.press/v235/beurer-kellner24a.html.

Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation

Abstract

Cite this Paper

Related Material