SAFE: Spiking Neural Network-based Audio Fidelity Evaluation

Aaditya Arunkumar Khant, Raveen Wijewickrama, Murtuza Jadliwala
Proceedings of the 17th Asian Conference on Machine Learning, PMLR 304:974-989, 2025.

Abstract

Recent advances in generative AI have enabled the creation of highly realistic synthetic audio, which poses significant challenges in voice authentication, media verification, and fraud detection. While Artificial Neural Networks (ANNs) are frequently used for fake audio detection, they often struggle to generalize to unseen and complex manipulations, particularly partial fake audio, where real and synthetic segments are seamlessly combined. This paper explores the use of Spiking Neural Networks (SNNs) for fake and partial fake audio detection – an unexplored area. Taking advantage of the inherent energy efficiency and temporal processing capabilities of SNNs, we propose novel SNN-based architectures for both tasks. We perform comprehensive evaluations that include hyperparameter tuning, cross-data set generalization, noise robustness, and partial fake audio detection using multiple large-scale public audio datasets. Our results show that SNNs achieve performance comparable to state-of-the-art ANN models while showing better generalization capabilities and robustness to noise. These SNN-based approaches also resulted in additional advantages, such as reduced model sizes and the ability to classify individual segments, making them more suitable for resource-constrained and real-time voice authentication applications. This work lays a foundation for exploring SNNs as countermeasures against audio spoofing in security-critical applications.

Cite this Paper


BibTeX
@InProceedings{pmlr-v304-khant25a, title = {SAFE: Spiking Neural Network-based Audio Fidelity Evaluation}, author = {Khant, Aaditya Arunkumar and Wijewickrama, Raveen and Jadliwala, Murtuza}, booktitle = {Proceedings of the 17th Asian Conference on Machine Learning}, pages = {974--989}, year = {2025}, editor = {Lee, Hung-yi and Liu, Tongliang}, volume = {304}, series = {Proceedings of Machine Learning Research}, month = {09--12 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v304/main/assets/khant25a/khant25a.pdf}, url = {https://proceedings.mlr.press/v304/khant25a.html}, abstract = {Recent advances in generative AI have enabled the creation of highly realistic synthetic audio, which poses significant challenges in voice authentication, media verification, and fraud detection. While Artificial Neural Networks (ANNs) are frequently used for fake audio detection, they often struggle to generalize to unseen and complex manipulations, particularly partial fake audio, where real and synthetic segments are seamlessly combined. This paper explores the use of Spiking Neural Networks (SNNs) for fake and partial fake audio detection – an unexplored area. Taking advantage of the inherent energy efficiency and temporal processing capabilities of SNNs, we propose novel SNN-based architectures for both tasks. We perform comprehensive evaluations that include hyperparameter tuning, cross-data set generalization, noise robustness, and partial fake audio detection using multiple large-scale public audio datasets. Our results show that SNNs achieve performance comparable to state-of-the-art ANN models while showing better generalization capabilities and robustness to noise. These SNN-based approaches also resulted in additional advantages, such as reduced model sizes and the ability to classify individual segments, making them more suitable for resource-constrained and real-time voice authentication applications. This work lays a foundation for exploring SNNs as countermeasures against audio spoofing in security-critical applications.} }
Endnote
%0 Conference Paper %T SAFE: Spiking Neural Network-based Audio Fidelity Evaluation %A Aaditya Arunkumar Khant %A Raveen Wijewickrama %A Murtuza Jadliwala %B Proceedings of the 17th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Hung-yi Lee %E Tongliang Liu %F pmlr-v304-khant25a %I PMLR %P 974--989 %U https://proceedings.mlr.press/v304/khant25a.html %V 304 %X Recent advances in generative AI have enabled the creation of highly realistic synthetic audio, which poses significant challenges in voice authentication, media verification, and fraud detection. While Artificial Neural Networks (ANNs) are frequently used for fake audio detection, they often struggle to generalize to unseen and complex manipulations, particularly partial fake audio, where real and synthetic segments are seamlessly combined. This paper explores the use of Spiking Neural Networks (SNNs) for fake and partial fake audio detection – an unexplored area. Taking advantage of the inherent energy efficiency and temporal processing capabilities of SNNs, we propose novel SNN-based architectures for both tasks. We perform comprehensive evaluations that include hyperparameter tuning, cross-data set generalization, noise robustness, and partial fake audio detection using multiple large-scale public audio datasets. Our results show that SNNs achieve performance comparable to state-of-the-art ANN models while showing better generalization capabilities and robustness to noise. These SNN-based approaches also resulted in additional advantages, such as reduced model sizes and the ability to classify individual segments, making them more suitable for resource-constrained and real-time voice authentication applications. This work lays a foundation for exploring SNNs as countermeasures against audio spoofing in security-critical applications.
APA
Khant, A.A., Wijewickrama, R. & Jadliwala, M.. (2025). SAFE: Spiking Neural Network-based Audio Fidelity Evaluation. Proceedings of the 17th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 304:974-989 Available from https://proceedings.mlr.press/v304/khant25a.html.

Related Material