SoundSynp: Sound Source Detection from Raw Waveforms with Multi-Scale Synperiodic Filterbanks

Yuhang He, Andrew Markham
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:9010-9023, 2023.

Abstract

We propose synperiodic filter banks, a novel multi-scale learnable filter bank construction strategy that all filters are synchronized by their rotating periodicity. By synchronizing in a certain periodicity, we naturally get filters whose temporal length are reduced if they carry higher frequency response, and vice versa. Such filters internally maintain a better time-frequency resolution trade-off. By further alternating the periodicity, we can easily obtain a group of synperiodic filter bank (we call synperiodic filter banks), where filters of same frequency response in different groups differ in temporal length. Convolving these filter banks with sound raw waveform achieves multi-scale perception in time domain. Moreover, applying the same filter banks to recursively process the 2x-downsampled waveform enables multi-scale perception in the frequency domain. Benefiting from the multi-scale perception in both time and frequency domains, our proposed synperiodic filter banks learn multi-scale time-frequency representation in a data-driven way. Experiments on both sound source direction of arrival (DoA) and physical location detection task show the superiority of synperiodic filter banks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-he23c, title = {SoundSynp: Sound Source Detection from Raw Waveforms with Multi-Scale Synperiodic Filterbanks}, author = {He, Yuhang and Markham, Andrew}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {9010--9023}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/he23c/he23c.pdf}, url = {https://proceedings.mlr.press/v206/he23c.html}, abstract = {We propose synperiodic filter banks, a novel multi-scale learnable filter bank construction strategy that all filters are synchronized by their rotating periodicity. By synchronizing in a certain periodicity, we naturally get filters whose temporal length are reduced if they carry higher frequency response, and vice versa. Such filters internally maintain a better time-frequency resolution trade-off. By further alternating the periodicity, we can easily obtain a group of synperiodic filter bank (we call synperiodic filter banks), where filters of same frequency response in different groups differ in temporal length. Convolving these filter banks with sound raw waveform achieves multi-scale perception in time domain. Moreover, applying the same filter banks to recursively process the 2x-downsampled waveform enables multi-scale perception in the frequency domain. Benefiting from the multi-scale perception in both time and frequency domains, our proposed synperiodic filter banks learn multi-scale time-frequency representation in a data-driven way. Experiments on both sound source direction of arrival (DoA) and physical location detection task show the superiority of synperiodic filter banks.} }
Endnote
%0 Conference Paper %T SoundSynp: Sound Source Detection from Raw Waveforms with Multi-Scale Synperiodic Filterbanks %A Yuhang He %A Andrew Markham %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-he23c %I PMLR %P 9010--9023 %U https://proceedings.mlr.press/v206/he23c.html %V 206 %X We propose synperiodic filter banks, a novel multi-scale learnable filter bank construction strategy that all filters are synchronized by their rotating periodicity. By synchronizing in a certain periodicity, we naturally get filters whose temporal length are reduced if they carry higher frequency response, and vice versa. Such filters internally maintain a better time-frequency resolution trade-off. By further alternating the periodicity, we can easily obtain a group of synperiodic filter bank (we call synperiodic filter banks), where filters of same frequency response in different groups differ in temporal length. Convolving these filter banks with sound raw waveform achieves multi-scale perception in time domain. Moreover, applying the same filter banks to recursively process the 2x-downsampled waveform enables multi-scale perception in the frequency domain. Benefiting from the multi-scale perception in both time and frequency domains, our proposed synperiodic filter banks learn multi-scale time-frequency representation in a data-driven way. Experiments on both sound source direction of arrival (DoA) and physical location detection task show the superiority of synperiodic filter banks.
APA
He, Y. & Markham, A.. (2023). SoundSynp: Sound Source Detection from Raw Waveforms with Multi-Scale Synperiodic Filterbanks. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:9010-9023 Available from https://proceedings.mlr.press/v206/he23c.html.

Related Material