[edit]
Position: AI Safety Must Embrace an Antifragile Perspective
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:81588-81605, 2025.
Abstract
This position paper contends that modern AI research must adopt an antifragile perspective on safety—one in which the system’s capacity to handle rare or out-of-distribution (OOD) events adapts and expands over repeated exposures. Conventional static benchmarks and single-shot robustness tests overlook the reality that environments evolve and that models, if left unchallenged, can drift into maladaptation (e.g., reward hacking, over-optimization, or atrophy of broader capabilities). We argue that an antifragile approach—Rather than striving to rapidly reduce current uncertainties, the emphasis is on leveraging those uncertainties to better prepare for potentially greater, more unpredictable uncertainties in the future—is pivotal for the long-term reliability of open-ended ML systems. In this position paper, we first identify key limitations of static testing, including scenario diversity, reward hacking, and over-alignment. We then explore the potential of dynamic, antifragile solutions to manage rare events. Crucially, we advocate for a fundamental recalibration of the methods used to measure, benchmark, and continually improve AI safety over the long term, complementing existing robustness approaches by providing ethical and practical guidelines towards fostering an antifragile AI safety community.