Choice is what matters after Attention

Chenhan Fu; Guoming Wang; Juncheng Li; Rongxing Lu; Siliang Tang

Choice is what matters after Attention

Chenhan Fu, Guoming Wang, Juncheng Li, Rongxing Lu, Siliang Tang

Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:262-270, 2025.

Abstract

The decoding strategies widely used in large language models (LLMs) today are Top-$p$ Sampling and Top-$k$ Sampling, both of which are methods situated between greedy decoding and random sampling. Inspired by the concept of loss aversion from prospect theory in behavioral economics, and the endowment effect as highlighted by Richard H. Thaler, the 2017 Nobel Memorial Prize in Economic Sciences — particularly the principle that "the negative utility of an equivalent loss is approximately twice the positive utility of a comparable gain" — we have developed a new decoding strategy called Loss Sampling. We have demonstrated the effectiveness and validity of our method on several LLMs, including Llama-2, Llama-3 and Mistral. Our approach improves text quality by 4-30% across four pure text tasks while maintaining diversity in text generation. Furthermore, we also extend our method to multimodal large models (LMs) and Beam Search, demonstrating the effectiveness and versatility of Loss Sampling with improvements ranging from 1-10%.

Cite this Paper

BibTeX

@InProceedings{pmlr-v258-fu25a,
  title = 	 {Choice is what matters after Attention},
  author =       {Fu, Chenhan and Wang, Guoming and Li, Juncheng and Lu, Rongxing and Tang, Siliang},
  booktitle = 	 {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {262--270},
  year = 	 {2025},
  editor = 	 {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz},
  volume = 	 {258},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {03--05 May},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v258/main/assets/fu25a/fu25a.pdf},
  url = 	 {https://proceedings.mlr.press/v258/fu25a.html},
  abstract = 	 {The decoding strategies widely used in large language models (LLMs) today are Top-$p$ Sampling and Top-$k$ Sampling, both of which are methods situated between greedy decoding and random sampling. Inspired by the concept of loss aversion from prospect theory in behavioral economics, and the endowment effect as highlighted by Richard H. Thaler, the 2017 Nobel Memorial Prize in Economic Sciences — particularly the principle that "the negative utility of an equivalent loss is approximately twice the positive utility of a comparable gain" — we have developed a new decoding strategy called Loss Sampling. We have demonstrated the effectiveness and validity of our method on several LLMs, including Llama-2, Llama-3 and Mistral. Our approach improves text quality by 4-30% across four pure text tasks while maintaining diversity in text generation. Furthermore, we also extend our method to multimodal large models (LMs) and Beam Search, demonstrating the effectiveness and versatility of Loss Sampling with improvements ranging from 1-10%.}
}

Endnote

%0 Conference Paper
%T Choice is what matters after Attention
%A Chenhan Fu
%A Guoming Wang
%A Juncheng Li
%A Rongxing Lu
%A Siliang Tang
%B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2025
%E Yingzhen Li
%E Stephan Mandt
%E Shipra Agrawal
%E Emtiyaz Khan	
%F pmlr-v258-fu25a
%I PMLR
%P 262--270
%U https://proceedings.mlr.press/v258/fu25a.html
%V 258
%X The decoding strategies widely used in large language models (LLMs) today are Top-$p$ Sampling and Top-$k$ Sampling, both of which are methods situated between greedy decoding and random sampling. Inspired by the concept of loss aversion from prospect theory in behavioral economics, and the endowment effect as highlighted by Richard H. Thaler, the 2017 Nobel Memorial Prize in Economic Sciences — particularly the principle that "the negative utility of an equivalent loss is approximately twice the positive utility of a comparable gain" — we have developed a new decoding strategy called Loss Sampling. We have demonstrated the effectiveness and validity of our method on several LLMs, including Llama-2, Llama-3 and Mistral. Our approach improves text quality by 4-30% across four pure text tasks while maintaining diversity in text generation. Furthermore, we also extend our method to multimodal large models (LMs) and Beam Search, demonstrating the effectiveness and versatility of Loss Sampling with improvements ranging from 1-10%.

APA

Fu, C., Wang, G., Li, J., Lu, R. & Tang, S.. (2025). Choice is what matters after Attention. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:262-270 Available from https://proceedings.mlr.press/v258/fu25a.html.

Choice is what matters after Attention

Abstract

Cite this Paper

Related Material