HEAR: Holistic Evaluation of Audio Representations

Joseph Turian, Jordie Shier, Humair Raj Khan, Bhiksha Raj, Björn W. Schuller, Christian J. Steinmetz, Colin Malloy, George Tzanetakis, Gissel Velarde, Kirk McNally, Max Henry, Nicolas Pinto, Camille Noufi, Christian Clough, Dorien Herremans, Eduardo Fonseca, Jesse Engel, Justin Salamon, Philippe Esling, Pranay Manocha, Shinji Watanabe, Zeyu Jin, Yonatan Bisk
Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track, PMLR 176:125-145, 2022.

Abstract

What audio embedding approach generalizes best to a wide range of downstream tasks across a variety of everyday domains without fine-tuning? The aim of the HEAR benchmark is to develop a general-purpose audio representation that provides a strong basis for learning in a wide variety of tasks and scenarios. HEAR evaluates audio representations using a benchmark suite across a variety of domains, including speech, environmental sound, and music. HEAR was launched as a NeurIPS 2021 shared challenge. In the spirit of shared exchange, each participant submitted an audio embedding model following a common API that is general-purpose, open-source, and freely available to use. Twenty-nine models by thirteen external teams were evaluated on nineteen diverse downstream tasks derived from sixteen datasets. Open evaluation code, submitted models and datasets are key contributions, enabling comprehensive and reproducible evaluation, as well as previously impossible longitudinal studies. It still remains an open question whether one single general-purpose audio representation can perform as holistically as the human ear.

Cite this Paper


BibTeX
@InProceedings{pmlr-v176-turian22a, title = {HEAR: Holistic Evaluation of Audio Representations}, author = {Turian, Joseph and Shier, Jordie and Khan, Humair Raj and Raj, Bhiksha and Schuller, Bj\"{o}rn W. and Steinmetz, Christian J. and Malloy, Colin and Tzanetakis, George and Velarde, Gissel and McNally, Kirk and Henry, Max and Pinto, Nicolas and Noufi, Camille and Clough, Christian and Herremans, Dorien and Fonseca, Eduardo and Engel, Jesse and Salamon, Justin and Esling, Philippe and Manocha, Pranay and Watanabe, Shinji and Jin, Zeyu and Bisk, Yonatan}, booktitle = {Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track}, pages = {125--145}, year = {2022}, editor = {Kiela, Douwe and Ciccone, Marco and Caputo, Barbara}, volume = {176}, series = {Proceedings of Machine Learning Research}, month = {06--14 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v176/turian22a/turian22a.pdf}, url = {https://proceedings.mlr.press/v176/turian22a.html}, abstract = {What audio embedding approach generalizes best to a wide range of downstream tasks across a variety of everyday domains without fine-tuning? The aim of the HEAR benchmark is to develop a general-purpose audio representation that provides a strong basis for learning in a wide variety of tasks and scenarios. HEAR evaluates audio representations using a benchmark suite across a variety of domains, including speech, environmental sound, and music. HEAR was launched as a NeurIPS 2021 shared challenge. In the spirit of shared exchange, each participant submitted an audio embedding model following a common API that is general-purpose, open-source, and freely available to use. Twenty-nine models by thirteen external teams were evaluated on nineteen diverse downstream tasks derived from sixteen datasets. Open evaluation code, submitted models and datasets are key contributions, enabling comprehensive and reproducible evaluation, as well as previously impossible longitudinal studies. It still remains an open question whether one single general-purpose audio representation can perform as holistically as the human ear.} }
Endnote
%0 Conference Paper %T HEAR: Holistic Evaluation of Audio Representations %A Joseph Turian %A Jordie Shier %A Humair Raj Khan %A Bhiksha Raj %A Björn W. Schuller %A Christian J. Steinmetz %A Colin Malloy %A George Tzanetakis %A Gissel Velarde %A Kirk McNally %A Max Henry %A Nicolas Pinto %A Camille Noufi %A Christian Clough %A Dorien Herremans %A Eduardo Fonseca %A Jesse Engel %A Justin Salamon %A Philippe Esling %A Pranay Manocha %A Shinji Watanabe %A Zeyu Jin %A Yonatan Bisk %B Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track %C Proceedings of Machine Learning Research %D 2022 %E Douwe Kiela %E Marco Ciccone %E Barbara Caputo %F pmlr-v176-turian22a %I PMLR %P 125--145 %U https://proceedings.mlr.press/v176/turian22a.html %V 176 %X What audio embedding approach generalizes best to a wide range of downstream tasks across a variety of everyday domains without fine-tuning? The aim of the HEAR benchmark is to develop a general-purpose audio representation that provides a strong basis for learning in a wide variety of tasks and scenarios. HEAR evaluates audio representations using a benchmark suite across a variety of domains, including speech, environmental sound, and music. HEAR was launched as a NeurIPS 2021 shared challenge. In the spirit of shared exchange, each participant submitted an audio embedding model following a common API that is general-purpose, open-source, and freely available to use. Twenty-nine models by thirteen external teams were evaluated on nineteen diverse downstream tasks derived from sixteen datasets. Open evaluation code, submitted models and datasets are key contributions, enabling comprehensive and reproducible evaluation, as well as previously impossible longitudinal studies. It still remains an open question whether one single general-purpose audio representation can perform as holistically as the human ear.
APA
Turian, J., Shier, J., Khan, H.R., Raj, B., Schuller, B.W., Steinmetz, C.J., Malloy, C., Tzanetakis, G., Velarde, G., McNally, K., Henry, M., Pinto, N., Noufi, C., Clough, C., Herremans, D., Fonseca, E., Engel, J., Salamon, J., Esling, P., Manocha, P., Watanabe, S., Jin, Z. & Bisk, Y.. (2022). HEAR: Holistic Evaluation of Audio Representations. Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track, in Proceedings of Machine Learning Research 176:125-145 Available from https://proceedings.mlr.press/v176/turian22a.html.

Related Material