The AI off-switch problem as a signalling game: bounded rationality and incomparability

Alessio Benavoli, Alessandro Facchini, Marco Zaffalon
Proceedings of the Fourteenth International Symposium on Imprecise Probabilities: Theories and Applications, PMLR 290:1-11, 2025.

Abstract

The off-switch problem is a critical challenge in AI control: if an AI system resists being switched off, it poses a significant risk. In this paper, we model the off-switch problem as a signalling game, where a human decision-maker communicates its preferences about some underlying decision problem to an AI agent, which then selects actions to maximise the human’s utility. We assume that the human is a bounded rational agent and explore various bounded rationality mechanisms. Using real machine learning models, we reprove prior results and demonstrate that a necessary condition for an AI system to refrain from disabling its off-switch is its uncertainty about the human’s utility. We also analyse how message costs influence optimal strategies and extend the analysis to scenarios involving incomparability.

Cite this Paper


BibTeX
@InProceedings{pmlr-v290-benavoli25a, title = {The AI off-switch problem as a signalling game: bounded rationality and incomparability}, author = {Benavoli, Alessio and Facchini, Alessandro and Zaffalon, Marco}, booktitle = {Proceedings of the Fourteenth International Symposium on Imprecise Probabilities: Theories and Applications}, pages = {1--11}, year = {2025}, editor = {Destercke, Sébastien and Erreygers, Alexander and Nendel, Max and Riedel, Frank and Troffaes, Matthias C. M.}, volume = {290}, series = {Proceedings of Machine Learning Research}, month = {15--18 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v290/main/assets/benavoli25a/benavoli25a.pdf}, url = {https://proceedings.mlr.press/v290/benavoli25a.html}, abstract = {The off-switch problem is a critical challenge in AI control: if an AI system resists being switched off, it poses a significant risk. In this paper, we model the off-switch problem as a signalling game, where a human decision-maker communicates its preferences about some underlying decision problem to an AI agent, which then selects actions to maximise the human’s utility. We assume that the human is a bounded rational agent and explore various bounded rationality mechanisms. Using real machine learning models, we reprove prior results and demonstrate that a necessary condition for an AI system to refrain from disabling its off-switch is its uncertainty about the human’s utility. We also analyse how message costs influence optimal strategies and extend the analysis to scenarios involving incomparability.} }
Endnote
%0 Conference Paper %T The AI off-switch problem as a signalling game: bounded rationality and incomparability %A Alessio Benavoli %A Alessandro Facchini %A Marco Zaffalon %B Proceedings of the Fourteenth International Symposium on Imprecise Probabilities: Theories and Applications %C Proceedings of Machine Learning Research %D 2025 %E Sébastien Destercke %E Alexander Erreygers %E Max Nendel %E Frank Riedel %E Matthias C. M. Troffaes %F pmlr-v290-benavoli25a %I PMLR %P 1--11 %U https://proceedings.mlr.press/v290/benavoli25a.html %V 290 %X The off-switch problem is a critical challenge in AI control: if an AI system resists being switched off, it poses a significant risk. In this paper, we model the off-switch problem as a signalling game, where a human decision-maker communicates its preferences about some underlying decision problem to an AI agent, which then selects actions to maximise the human’s utility. We assume that the human is a bounded rational agent and explore various bounded rationality mechanisms. Using real machine learning models, we reprove prior results and demonstrate that a necessary condition for an AI system to refrain from disabling its off-switch is its uncertainty about the human’s utility. We also analyse how message costs influence optimal strategies and extend the analysis to scenarios involving incomparability.
APA
Benavoli, A., Facchini, A. & Zaffalon, M.. (2025). The AI off-switch problem as a signalling game: bounded rationality and incomparability. Proceedings of the Fourteenth International Symposium on Imprecise Probabilities: Theories and Applications, in Proceedings of Machine Learning Research 290:1-11 Available from https://proceedings.mlr.press/v290/benavoli25a.html.

Related Material