SoundSynp: Sound Source Detection from Raw Waveforms with Multi-Scale Synperiodic Filterbanks
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:9010-9023, 2023.
We propose synperiodic filter banks, a novel multi-scale learnable filter bank construction strategy that all filters are synchronized by their rotating periodicity. By synchronizing in a certain periodicity, we naturally get filters whose temporal length are reduced if they carry higher frequency response, and vice versa. Such filters internally maintain a better time-frequency resolution trade-off. By further alternating the periodicity, we can easily obtain a group of synperiodic filter bank (we call synperiodic filter banks), where filters of same frequency response in different groups differ in temporal length. Convolving these filter banks with sound raw waveform achieves multi-scale perception in time domain. Moreover, applying the same filter banks to recursively process the 2x-downsampled waveform enables multi-scale perception in the frequency domain. Benefiting from the multi-scale perception in both time and frequency domains, our proposed synperiodic filter banks learn multi-scale time-frequency representation in a data-driven way. Experiments on both sound source direction of arrival (DoA) and physical location detection task show the superiority of synperiodic filter banks.