Human Control: Definitions and Algorithms

Ryan Carey; Tom Everitt

Human Control: Definitions and Algorithms

Ryan Carey, Tom Everitt

Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, PMLR 216:271-281, 2023.

Abstract

How can humans stay in control of advanced artificial intelligence systems? One proposal is corrigibility, which requires the agent to follow the instructions of a human overseer, without inappropriately influencing them. In this paper, we formally define a variant of corrigibility called shutdown instructability, and show that it implies appropriate shutdown behavior, retention of human autonomy, and avoidance of user harm. We also analyse the related concepts of non-obstruction and shutdown alignment, three previously proposed algorithms for human control, and one new algorithm.

Cite this Paper

BibTeX

@InProceedings{pmlr-v216-carey23a,
  title = 	 {Human Control: Definitions and Algorithms},
  author =       {Carey, Ryan and Everitt, Tom},
  booktitle = 	 {Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence},
  pages = 	 {271--281},
  year = 	 {2023},
  editor = 	 {Evans, Robin J. and Shpitser, Ilya},
  volume = 	 {216},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {31 Jul--04 Aug},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v216/carey23a/carey23a.pdf},
  url = 	 {https://proceedings.mlr.press/v216/carey23a.html},
  abstract = 	 {How can humans stay in control of advanced artificial intelligence systems? One proposal is corrigibility, which requires the agent to follow the instructions of a human overseer, without inappropriately influencing them. In this paper, we formally define a variant of corrigibility called shutdown instructability, and show that it implies appropriate shutdown behavior, retention of human autonomy, and avoidance of user harm. We also analyse the related concepts of non-obstruction and shutdown alignment, three previously proposed algorithms for human control, and one new algorithm.}
}

Endnote

%0 Conference Paper
%T Human Control: Definitions and Algorithms
%A Ryan Carey
%A Tom Everitt
%B Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence
%C Proceedings of Machine Learning Research
%D 2023
%E Robin J. Evans
%E Ilya Shpitser	
%F pmlr-v216-carey23a
%I PMLR
%P 271--281
%U https://proceedings.mlr.press/v216/carey23a.html
%V 216
%X How can humans stay in control of advanced artificial intelligence systems? One proposal is corrigibility, which requires the agent to follow the instructions of a human overseer, without inappropriately influencing them. In this paper, we formally define a variant of corrigibility called shutdown instructability, and show that it implies appropriate shutdown behavior, retention of human autonomy, and avoidance of user harm. We also analyse the related concepts of non-obstruction and shutdown alignment, three previously proposed algorithms for human control, and one new algorithm.

APA

Carey, R. & Everitt, T.. (2023). Human Control: Definitions and Algorithms. Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 216:271-281 Available from https://proceedings.mlr.press/v216/carey23a.html.

Human Control: Definitions and Algorithms

Abstract

Cite this Paper

Related Material