[edit]
DCRes2Net: Enhanced Res2Net with Dimensional Feature Fusion for Speaker Verification
Proceedings of 2025 2nd International Conference on Machine Learning and Intelligent Computing, PMLR 278:251-259, 2025.
Abstract
Many classical convolutional architectures have been introduced to the field of speaker verification; however, solely employing one-dimensional or two-dimensional convolutions is insufficient for efficiently modeling speaker features. To address this limitation, this paper introduces a multi-dimensional feature fusion strategy and presents an enhanced Res2Net architecture based on dimensional feature fusion. Different feature modeling techniques complement each other, fully leveraging their respective advantages in temporal and spatial feature extraction to achieve comprehensive representation of multi-dimensional data. Experiments conducted on the VoxCeleb dataset demonstrate that the proposed architecture achieves competitive performance along with robust generalizability.