Position: Intent-aligned AI Systems Must Optimize for Agency Preservation

Catalin Mitelut, Benjamin Smith, Peter Vamplew
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:35851-35875, 2024.

Abstract

A central approach to AI-safety research has been to generate aligned AI systems: i.e. systems that do not deceive users and yield actions or recommendations that humans might judge as consistent with their intentions and goals. Here we argue that truthful AIs aligned solely to human intent are insufficient and that preservation of long-term agency of humans may be a more robust standard that may need to be separated and explicitly optimized for. We discuss the science of intent and control and how human intent can be manipulated and we provide a formal definition of agency-preserving AI-human interactions focusing on forward-looking explicit agency evaluations. Our work points to a novel pathway for human harm in AI-human interactions and proposes solutions to this challenge.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-mitelut24a, title = {Position: Intent-aligned {AI} Systems Must Optimize for Agency Preservation}, author = {Mitelut, Catalin and Smith, Benjamin and Vamplew, Peter}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {35851--35875}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/mitelut24a/mitelut24a.pdf}, url = {https://proceedings.mlr.press/v235/mitelut24a.html}, abstract = {A central approach to AI-safety research has been to generate aligned AI systems: i.e. systems that do not deceive users and yield actions or recommendations that humans might judge as consistent with their intentions and goals. Here we argue that truthful AIs aligned solely to human intent are insufficient and that preservation of long-term agency of humans may be a more robust standard that may need to be separated and explicitly optimized for. We discuss the science of intent and control and how human intent can be manipulated and we provide a formal definition of agency-preserving AI-human interactions focusing on forward-looking explicit agency evaluations. Our work points to a novel pathway for human harm in AI-human interactions and proposes solutions to this challenge.} }
Endnote
%0 Conference Paper %T Position: Intent-aligned AI Systems Must Optimize for Agency Preservation %A Catalin Mitelut %A Benjamin Smith %A Peter Vamplew %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-mitelut24a %I PMLR %P 35851--35875 %U https://proceedings.mlr.press/v235/mitelut24a.html %V 235 %X A central approach to AI-safety research has been to generate aligned AI systems: i.e. systems that do not deceive users and yield actions or recommendations that humans might judge as consistent with their intentions and goals. Here we argue that truthful AIs aligned solely to human intent are insufficient and that preservation of long-term agency of humans may be a more robust standard that may need to be separated and explicitly optimized for. We discuss the science of intent and control and how human intent can be manipulated and we provide a formal definition of agency-preserving AI-human interactions focusing on forward-looking explicit agency evaluations. Our work points to a novel pathway for human harm in AI-human interactions and proposes solutions to this challenge.
APA
Mitelut, C., Smith, B. & Vamplew, P.. (2024). Position: Intent-aligned AI Systems Must Optimize for Agency Preservation. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:35851-35875 Available from https://proceedings.mlr.press/v235/mitelut24a.html.

Related Material