Focus On This, Not That! Steering LLMs with Adaptive Feature Specification

Tom A. Lamb, Adam Davies, Alasdair Paren, Philip Torr, Francesco Pinto
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:32339-32374, 2025.

Abstract

Despite the success of Instruction Tuning (IT) in training large language models (LLMs), such models often leverage spurious or biased features learnt from their training data and can become misaligned, leading to undesired behaviours. While existing techniques can steer model behaviour at inference-time, they are often post-hoc and do not embed steering as an intrinsic model feature. In this work, we introduce Focus Instruction Tuning (FIT), which trains LLMs to condition their responses by focusing on specific features whilst ignoring others, leading to different behaviours based on what features are specified. Across diverse benchmarks, we demonstrate that FIT: (i) successfully steers behaviour at inference time; (ii) increases robustness by amplifying core task signals and down-weighting spurious cues; (iii) mitigates social bias by suppressing demographic attributes; and (iv) generalises under distribution shifts and to previously unseen focus features. FIT therefore offers a lightweight, intrinsic mechanism for building more robust, fair, and easily controllable LLMs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-lamb25a, title = {Focus On This, Not That! {S}teering {LLM}s with Adaptive Feature Specification}, author = {Lamb, Tom A. and Davies, Adam and Paren, Alasdair and Torr, Philip and Pinto, Francesco}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {32339--32374}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/lamb25a/lamb25a.pdf}, url = {https://proceedings.mlr.press/v267/lamb25a.html}, abstract = {Despite the success of Instruction Tuning (IT) in training large language models (LLMs), such models often leverage spurious or biased features learnt from their training data and can become misaligned, leading to undesired behaviours. While existing techniques can steer model behaviour at inference-time, they are often post-hoc and do not embed steering as an intrinsic model feature. In this work, we introduce Focus Instruction Tuning (FIT), which trains LLMs to condition their responses by focusing on specific features whilst ignoring others, leading to different behaviours based on what features are specified. Across diverse benchmarks, we demonstrate that FIT: (i) successfully steers behaviour at inference time; (ii) increases robustness by amplifying core task signals and down-weighting spurious cues; (iii) mitigates social bias by suppressing demographic attributes; and (iv) generalises under distribution shifts and to previously unseen focus features. FIT therefore offers a lightweight, intrinsic mechanism for building more robust, fair, and easily controllable LLMs.} }
Endnote
%0 Conference Paper %T Focus On This, Not That! Steering LLMs with Adaptive Feature Specification %A Tom A. Lamb %A Adam Davies %A Alasdair Paren %A Philip Torr %A Francesco Pinto %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-lamb25a %I PMLR %P 32339--32374 %U https://proceedings.mlr.press/v267/lamb25a.html %V 267 %X Despite the success of Instruction Tuning (IT) in training large language models (LLMs), such models often leverage spurious or biased features learnt from their training data and can become misaligned, leading to undesired behaviours. While existing techniques can steer model behaviour at inference-time, they are often post-hoc and do not embed steering as an intrinsic model feature. In this work, we introduce Focus Instruction Tuning (FIT), which trains LLMs to condition their responses by focusing on specific features whilst ignoring others, leading to different behaviours based on what features are specified. Across diverse benchmarks, we demonstrate that FIT: (i) successfully steers behaviour at inference time; (ii) increases robustness by amplifying core task signals and down-weighting spurious cues; (iii) mitigates social bias by suppressing demographic attributes; and (iv) generalises under distribution shifts and to previously unseen focus features. FIT therefore offers a lightweight, intrinsic mechanism for building more robust, fair, and easily controllable LLMs.
APA
Lamb, T.A., Davies, A., Paren, A., Torr, P. & Pinto, F.. (2025). Focus On This, Not That! Steering LLMs with Adaptive Feature Specification. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:32339-32374 Available from https://proceedings.mlr.press/v267/lamb25a.html.

Related Material