Position: A Safe Harbor for AI Evaluation and Red Teaming

Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, Borhane Blili-Hamelin, Yangsibo Huang, Aviya Skowron, Zheng Xin Yong, Suhas Kotha, Yi Zeng, Weiyan Shi, Xianjun Yang, Reid Southen, Alexander Robey, Patrick Chao, Diyi Yang, Ruoxi Jia, Daniel Kang, Alex Pentland, Arvind Narayanan, Percy Liang, Peter Henderson
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:32691-32710, 2024.

Abstract

Independent evaluation and red teaming are critical for identifying the risks posed by generative AI systems. However, the terms of service and enforcement strategies used by prominent AI companies to deter model misuse have disincentives on good faith safety evaluations. This causes some researchers to fear that conducting such research or releasing their findings will result in account suspensions or legal reprisal. Although some companies offer researcher access programs, they are an inadequate substitute for independent research access, as they have limited community representation, receive inadequate funding, and lack independence from corporate incentives. We propose that major generative AI developers commit to providing a legal and technical safe harbor, protecting public interest safety research and removing the threat of account suspensions or legal reprisal. These proposals emerged from our collective experience conducting safety, privacy, and trustworthiness research on generative AI systems, where norms and incentives could be better aligned with public interests, without exacerbating model misuse. We believe these commitments are a necessary step towards more inclusive and unimpeded community efforts to tackle the risks of generative AI.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-longpre24a, title = {Position: A Safe Harbor for {AI} Evaluation and Red Teaming}, author = {Longpre, Shayne and Kapoor, Sayash and Klyman, Kevin and Ramaswami, Ashwin and Bommasani, Rishi and Blili-Hamelin, Borhane and Huang, Yangsibo and Skowron, Aviya and Yong, Zheng Xin and Kotha, Suhas and Zeng, Yi and Shi, Weiyan and Yang, Xianjun and Southen, Reid and Robey, Alexander and Chao, Patrick and Yang, Diyi and Jia, Ruoxi and Kang, Daniel and Pentland, Alex and Narayanan, Arvind and Liang, Percy and Henderson, Peter}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {32691--32710}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/longpre24a/longpre24a.pdf}, url = {https://proceedings.mlr.press/v235/longpre24a.html}, abstract = {Independent evaluation and red teaming are critical for identifying the risks posed by generative AI systems. However, the terms of service and enforcement strategies used by prominent AI companies to deter model misuse have disincentives on good faith safety evaluations. This causes some researchers to fear that conducting such research or releasing their findings will result in account suspensions or legal reprisal. Although some companies offer researcher access programs, they are an inadequate substitute for independent research access, as they have limited community representation, receive inadequate funding, and lack independence from corporate incentives. We propose that major generative AI developers commit to providing a legal and technical safe harbor, protecting public interest safety research and removing the threat of account suspensions or legal reprisal. These proposals emerged from our collective experience conducting safety, privacy, and trustworthiness research on generative AI systems, where norms and incentives could be better aligned with public interests, without exacerbating model misuse. We believe these commitments are a necessary step towards more inclusive and unimpeded community efforts to tackle the risks of generative AI.} }
Endnote
%0 Conference Paper %T Position: A Safe Harbor for AI Evaluation and Red Teaming %A Shayne Longpre %A Sayash Kapoor %A Kevin Klyman %A Ashwin Ramaswami %A Rishi Bommasani %A Borhane Blili-Hamelin %A Yangsibo Huang %A Aviya Skowron %A Zheng Xin Yong %A Suhas Kotha %A Yi Zeng %A Weiyan Shi %A Xianjun Yang %A Reid Southen %A Alexander Robey %A Patrick Chao %A Diyi Yang %A Ruoxi Jia %A Daniel Kang %A Alex Pentland %A Arvind Narayanan %A Percy Liang %A Peter Henderson %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-longpre24a %I PMLR %P 32691--32710 %U https://proceedings.mlr.press/v235/longpre24a.html %V 235 %X Independent evaluation and red teaming are critical for identifying the risks posed by generative AI systems. However, the terms of service and enforcement strategies used by prominent AI companies to deter model misuse have disincentives on good faith safety evaluations. This causes some researchers to fear that conducting such research or releasing their findings will result in account suspensions or legal reprisal. Although some companies offer researcher access programs, they are an inadequate substitute for independent research access, as they have limited community representation, receive inadequate funding, and lack independence from corporate incentives. We propose that major generative AI developers commit to providing a legal and technical safe harbor, protecting public interest safety research and removing the threat of account suspensions or legal reprisal. These proposals emerged from our collective experience conducting safety, privacy, and trustworthiness research on generative AI systems, where norms and incentives could be better aligned with public interests, without exacerbating model misuse. We believe these commitments are a necessary step towards more inclusive and unimpeded community efforts to tackle the risks of generative AI.
APA
Longpre, S., Kapoor, S., Klyman, K., Ramaswami, A., Bommasani, R., Blili-Hamelin, B., Huang, Y., Skowron, A., Yong, Z.X., Kotha, S., Zeng, Y., Shi, W., Yang, X., Southen, R., Robey, A., Chao, P., Yang, D., Jia, R., Kang, D., Pentland, A., Narayanan, A., Liang, P. & Henderson, P.. (2024). Position: A Safe Harbor for AI Evaluation and Red Teaming. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:32691-32710 Available from https://proceedings.mlr.press/v235/longpre24a.html.

Related Material