Heads up! Large Language Models Can Perform Tasks Without Your Instruction via Selective Attention Head Masking

Senyu Han, Hongchuan Zeng, Kai Yu, Lu Chen
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:21948-21973, 2025.

Abstract

Large language models (LLMs) consist of numerous Transformer modules, and while the models can perform various functions, it remains an open question of how these modules are combined to elicit distinct inherent functionalities. In this paper, we investigate the modules inside LLMs and demonstrate that, by simply masking or retaining specific attention heads during inference, LLMs can exhibit specific task functionalities without requiring explicit instructions or modifications to the model parameters. Experiments across various models and tasks reveal that LLMs inherently encode “functional pathways”, the structured groups of interdependent attention heads that are crucial for executing specific tasks. These pathways not only govern the model’s functional behaviors but also enhance parameter efficiency, as suppressing attention heads outside the pathway can improve task performance. The code is available in this repository: https://github.com/OpenDFM/HeadsUp.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-han25l, title = {Heads up! {L}arge Language Models Can Perform Tasks Without Your Instruction via Selective Attention Head Masking}, author = {Han, Senyu and Zeng, Hongchuan and Yu, Kai and Chen, Lu}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {21948--21973}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/han25l/han25l.pdf}, url = {https://proceedings.mlr.press/v267/han25l.html}, abstract = {Large language models (LLMs) consist of numerous Transformer modules, and while the models can perform various functions, it remains an open question of how these modules are combined to elicit distinct inherent functionalities. In this paper, we investigate the modules inside LLMs and demonstrate that, by simply masking or retaining specific attention heads during inference, LLMs can exhibit specific task functionalities without requiring explicit instructions or modifications to the model parameters. Experiments across various models and tasks reveal that LLMs inherently encode “functional pathways”, the structured groups of interdependent attention heads that are crucial for executing specific tasks. These pathways not only govern the model’s functional behaviors but also enhance parameter efficiency, as suppressing attention heads outside the pathway can improve task performance. The code is available in this repository: https://github.com/OpenDFM/HeadsUp.} }
Endnote
%0 Conference Paper %T Heads up! Large Language Models Can Perform Tasks Without Your Instruction via Selective Attention Head Masking %A Senyu Han %A Hongchuan Zeng %A Kai Yu %A Lu Chen %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-han25l %I PMLR %P 21948--21973 %U https://proceedings.mlr.press/v267/han25l.html %V 267 %X Large language models (LLMs) consist of numerous Transformer modules, and while the models can perform various functions, it remains an open question of how these modules are combined to elicit distinct inherent functionalities. In this paper, we investigate the modules inside LLMs and demonstrate that, by simply masking or retaining specific attention heads during inference, LLMs can exhibit specific task functionalities without requiring explicit instructions or modifications to the model parameters. Experiments across various models and tasks reveal that LLMs inherently encode “functional pathways”, the structured groups of interdependent attention heads that are crucial for executing specific tasks. These pathways not only govern the model’s functional behaviors but also enhance parameter efficiency, as suppressing attention heads outside the pathway can improve task performance. The code is available in this repository: https://github.com/OpenDFM/HeadsUp.
APA
Han, S., Zeng, H., Yu, K. & Chen, L.. (2025). Heads up! Large Language Models Can Perform Tasks Without Your Instruction via Selective Attention Head Masking. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:21948-21973 Available from https://proceedings.mlr.press/v267/han25l.html.

Related Material