Position: AI Safety Must Embrace an Antifragile Perspective

Ming Jin, Hyunin Lee
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:81588-81605, 2025.

Abstract

This position paper contends that modern AI research must adopt an antifragile perspective on safety—one in which the system’s capacity to handle rare or out-of-distribution (OOD) events adapts and expands over repeated exposures. Conventional static benchmarks and single-shot robustness tests overlook the reality that environments evolve and that models, if left unchallenged, can drift into maladaptation (e.g., reward hacking, over-optimization, or atrophy of broader capabilities). We argue that an antifragile approach—Rather than striving to rapidly reduce current uncertainties, the emphasis is on leveraging those uncertainties to better prepare for potentially greater, more unpredictable uncertainties in the future—is pivotal for the long-term reliability of open-ended ML systems. In this position paper, we first identify key limitations of static testing, including scenario diversity, reward hacking, and over-alignment. We then explore the potential of dynamic, antifragile solutions to manage rare events. Crucially, we advocate for a fundamental recalibration of the methods used to measure, benchmark, and continually improve AI safety over the long term, complementing existing robustness approaches by providing ethical and practical guidelines towards fostering an antifragile AI safety community.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-jin25m, title = {Position: {AI} Safety Must Embrace an Antifragile Perspective}, author = {Jin, Ming and Lee, Hyunin}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {81588--81605}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/jin25m/jin25m.pdf}, url = {https://proceedings.mlr.press/v267/jin25m.html}, abstract = {This position paper contends that modern AI research must adopt an antifragile perspective on safety—one in which the system’s capacity to handle rare or out-of-distribution (OOD) events adapts and expands over repeated exposures. Conventional static benchmarks and single-shot robustness tests overlook the reality that environments evolve and that models, if left unchallenged, can drift into maladaptation (e.g., reward hacking, over-optimization, or atrophy of broader capabilities). We argue that an antifragile approach—Rather than striving to rapidly reduce current uncertainties, the emphasis is on leveraging those uncertainties to better prepare for potentially greater, more unpredictable uncertainties in the future—is pivotal for the long-term reliability of open-ended ML systems. In this position paper, we first identify key limitations of static testing, including scenario diversity, reward hacking, and over-alignment. We then explore the potential of dynamic, antifragile solutions to manage rare events. Crucially, we advocate for a fundamental recalibration of the methods used to measure, benchmark, and continually improve AI safety over the long term, complementing existing robustness approaches by providing ethical and practical guidelines towards fostering an antifragile AI safety community.} }
Endnote
%0 Conference Paper %T Position: AI Safety Must Embrace an Antifragile Perspective %A Ming Jin %A Hyunin Lee %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-jin25m %I PMLR %P 81588--81605 %U https://proceedings.mlr.press/v267/jin25m.html %V 267 %X This position paper contends that modern AI research must adopt an antifragile perspective on safety—one in which the system’s capacity to handle rare or out-of-distribution (OOD) events adapts and expands over repeated exposures. Conventional static benchmarks and single-shot robustness tests overlook the reality that environments evolve and that models, if left unchallenged, can drift into maladaptation (e.g., reward hacking, over-optimization, or atrophy of broader capabilities). We argue that an antifragile approach—Rather than striving to rapidly reduce current uncertainties, the emphasis is on leveraging those uncertainties to better prepare for potentially greater, more unpredictable uncertainties in the future—is pivotal for the long-term reliability of open-ended ML systems. In this position paper, we first identify key limitations of static testing, including scenario diversity, reward hacking, and over-alignment. We then explore the potential of dynamic, antifragile solutions to manage rare events. Crucially, we advocate for a fundamental recalibration of the methods used to measure, benchmark, and continually improve AI safety over the long term, complementing existing robustness approaches by providing ethical and practical guidelines towards fostering an antifragile AI safety community.
APA
Jin, M. & Lee, H.. (2025). Position: AI Safety Must Embrace an Antifragile Perspective. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:81588-81605 Available from https://proceedings.mlr.press/v267/jin25m.html.

Related Material