DCRes2Net: Enhanced Res2Net with Dimensional Feature Fusion for Speaker Verification

Ya Li, Bin Zhou, Bo Hu
Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing, PMLR 278:251-259, 2025.

Abstract

Many classical convolutional architectures have been introduced to the field of speaker verification; however, solely employing one-dimensional or two-dimensional convolutions is insufficient for efficiently modeling speaker features. To address this limitation, this paper introduces a multi-dimensional feature fusion strategy and presents an enhanced Res2Net architecture based on dimensional feature fusion. Different feature modeling techniques complement each other, fully leveraging their respective advantages in temporal and spatial feature extraction to achieve comprehensive representation of multi-dimensional data. Experiments conducted on the VoxCeleb dataset demonstrate that the proposed architecture achieves competitive performance along with robust generalizability.

Cite this Paper


BibTeX
@InProceedings{pmlr-v278-li25f, title = {DCRes2Net: Enhanced Res2Net with Dimensional Feature Fusion for Speaker Verification}, author = {Li, Ya and Zhou, Bin and Hu, Bo}, booktitle = {Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing}, pages = {251--259}, year = {2025}, editor = {Zeng, Nianyin and Pachori, Ram Bilas and Wang, Dongshu}, volume = {278}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v278/main/assets/li25f/li25f.pdf}, url = {https://proceedings.mlr.press/v278/li25f.html}, abstract = {Many classical convolutional architectures have been introduced to the field of speaker verification; however, solely employing one-dimensional or two-dimensional convolutions is insufficient for efficiently modeling speaker features. To address this limitation, this paper introduces a multi-dimensional feature fusion strategy and presents an enhanced Res2Net architecture based on dimensional feature fusion. Different feature modeling techniques complement each other, fully leveraging their respective advantages in temporal and spatial feature extraction to achieve comprehensive representation of multi-dimensional data. Experiments conducted on the VoxCeleb dataset demonstrate that the proposed architecture achieves competitive performance along with robust generalizability.} }
Endnote
%0 Conference Paper %T DCRes2Net: Enhanced Res2Net with Dimensional Feature Fusion for Speaker Verification %A Ya Li %A Bin Zhou %A Bo Hu %B Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing %C Proceedings of Machine Learning Research %D 2025 %E Nianyin Zeng %E Ram Bilas Pachori %E Dongshu Wang %F pmlr-v278-li25f %I PMLR %P 251--259 %U https://proceedings.mlr.press/v278/li25f.html %V 278 %X Many classical convolutional architectures have been introduced to the field of speaker verification; however, solely employing one-dimensional or two-dimensional convolutions is insufficient for efficiently modeling speaker features. To address this limitation, this paper introduces a multi-dimensional feature fusion strategy and presents an enhanced Res2Net architecture based on dimensional feature fusion. Different feature modeling techniques complement each other, fully leveraging their respective advantages in temporal and spatial feature extraction to achieve comprehensive representation of multi-dimensional data. Experiments conducted on the VoxCeleb dataset demonstrate that the proposed architecture achieves competitive performance along with robust generalizability.
APA
Li, Y., Zhou, B. & Hu, B.. (2025). DCRes2Net: Enhanced Res2Net with Dimensional Feature Fusion for Speaker Verification. Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing, in Proceedings of Machine Learning Research 278:251-259 Available from https://proceedings.mlr.press/v278/li25f.html.

Related Material