Splitting & Integrating: Out-of-Distribution Detection via Adversarial Gradient Attribution

Jiayu Zhang; Xinyi Wang; Zhibo Jin; Zhiyu Zhu; Jianlong Zhou; Fang Chen; Huaming Chen

Splitting & Integrating: Out-of-Distribution Detection via Adversarial Gradient Attribution

Jiayu Zhang, Xinyi Wang, Zhibo Jin, Zhiyu Zhu, Jianlong Zhou, Fang Chen, Huaming Chen

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:76213-76224, 2025.

Abstract

Out-of-distribution (OOD) detection is essential for enhancing the robustness and security of deep learning models in unknown and dynamic data environments. Gradient-based OOD detection methods, such as GAIA, analyse the explanation pattern representations of in-distribution (ID) and OOD samples by examining the sensitivity of model outputs w.r.t. model inputs, resulting in superior performance compared to traditional OOD detection methods. However, we argue that the non-zero gradient behaviors of OOD samples do not exhibit significant distinguishability, especially when ID samples are perturbed by random perturbations in high-dimensional spaces, which negatively impacts the accuracy of OOD detection. In this paper, we propose a novel OOD detection method called S & I based on layer Splitting and gradient Integration via Adversarial Gradient Attribution. Specifically, our approach involves splitting the model’s intermediate layers and iteratively updating adversarial examples layer-by-layer. We then integrate the attribution gradients from each intermediate layer along the attribution path from adversarial examples to the actual input, yielding true explanation pattern representations for both ID and OOD samples. Experiments demonstrate that our S & I algorithm achieves state-of-the-art results, with the average FPR95 of 29.05% (ResNet34)/38.61% (WRN40) and 37.31% (BiT-S) on the CIFAR100 and ImageNet benchmarks, respectively. Our code is available at: https://github.com/LMBTough/S-Ihttps://github.com/LMBTough/S-I

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-zhang25by,
  title = 	 {Splitting & Integrating: Out-of-Distribution Detection via Adversarial Gradient Attribution},
  author =       {Zhang, Jiayu and Wang, Xinyi and Jin, Zhibo and Zhu, Zhiyu and Zhou, Jianlong and Chen, Fang and Chen, Huaming},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {76213--76224},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/zhang25by/zhang25by.pdf},
  url = 	 {https://proceedings.mlr.press/v267/zhang25by.html},
  abstract = 	 {Out-of-distribution (OOD) detection is essential for enhancing the robustness and security of deep learning models in unknown and dynamic data environments. Gradient-based OOD detection methods, such as GAIA, analyse the explanation pattern representations of in-distribution (ID) and OOD samples by examining the sensitivity of model outputs w.r.t. model inputs, resulting in superior performance compared to traditional OOD detection methods. However, we argue that the non-zero gradient behaviors of OOD samples do not exhibit significant distinguishability, especially when ID samples are perturbed by random perturbations in high-dimensional spaces, which negatively impacts the accuracy of OOD detection. In this paper, we propose a novel OOD detection method called S & I based on layer Splitting and gradient Integration via Adversarial Gradient Attribution. Specifically, our approach involves splitting the model’s intermediate layers and iteratively updating adversarial examples layer-by-layer. We then integrate the attribution gradients from each intermediate layer along the attribution path from adversarial examples to the actual input, yielding true explanation pattern representations for both ID and OOD samples. Experiments demonstrate that our S & I algorithm achieves state-of-the-art results, with the average FPR95 of 29.05% (ResNet34)/38.61% (WRN40) and 37.31% (BiT-S) on the CIFAR100 and ImageNet benchmarks, respectively. Our code is available at: https://github.com/LMBTough/S-Ihttps://github.com/LMBTough/S-I}
}

Endnote

%0 Conference Paper
%T Splitting & Integrating: Out-of-Distribution Detection via Adversarial Gradient Attribution
%A Jiayu Zhang
%A Xinyi Wang
%A Zhibo Jin
%A Zhiyu Zhu
%A Jianlong Zhou
%A Fang Chen
%A Huaming Chen
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-zhang25by
%I PMLR
%P 76213--76224
%U https://proceedings.mlr.press/v267/zhang25by.html
%V 267
%X Out-of-distribution (OOD) detection is essential for enhancing the robustness and security of deep learning models in unknown and dynamic data environments. Gradient-based OOD detection methods, such as GAIA, analyse the explanation pattern representations of in-distribution (ID) and OOD samples by examining the sensitivity of model outputs w.r.t. model inputs, resulting in superior performance compared to traditional OOD detection methods. However, we argue that the non-zero gradient behaviors of OOD samples do not exhibit significant distinguishability, especially when ID samples are perturbed by random perturbations in high-dimensional spaces, which negatively impacts the accuracy of OOD detection. In this paper, we propose a novel OOD detection method called S & I based on layer Splitting and gradient Integration via Adversarial Gradient Attribution. Specifically, our approach involves splitting the model’s intermediate layers and iteratively updating adversarial examples layer-by-layer. We then integrate the attribution gradients from each intermediate layer along the attribution path from adversarial examples to the actual input, yielding true explanation pattern representations for both ID and OOD samples. Experiments demonstrate that our S & I algorithm achieves state-of-the-art results, with the average FPR95 of 29.05% (ResNet34)/38.61% (WRN40) and 37.31% (BiT-S) on the CIFAR100 and ImageNet benchmarks, respectively. Our code is available at: https://github.com/LMBTough/S-Ihttps://github.com/LMBTough/S-I

APA

Zhang, J., Wang, X., Jin, Z., Zhu, Z., Zhou, J., Chen, F. & Chen, H.. (2025). Splitting & Integrating: Out-of-Distribution Detection via Adversarial Gradient Attribution. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:76213-76224 Available from https://proceedings.mlr.press/v267/zhang25by.html.

Splitting & Integrating: Out-of-Distribution Detection via Adversarial Gradient Attribution

Abstract

Cite this Paper

Related Material