MetricGAN-OKD: Multi-Metric Optimization of MetricGAN via Online Knowledge Distillation for Speech Enhancement

Wooseok Shin, Byung Hoon Lee, Jin Sob Kim, Hyun Joon Park, Sung Won Han
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:31521-31538, 2023.

Abstract

In speech enhancement, MetricGAN-based approaches reduce the discrepancy between the $L_p$ loss and evaluation metrics by utilizing a non-differentiable evaluation metric as the objective function. However, optimizing multiple metrics simultaneously remains challenging owing to the problem of confusing gradient directions. In this paper, we propose an effective multi-metric optimization method in MetricGAN via online knowledge distillation—MetricGAN-OKD. MetricGAN-OKD, which consists of multiple generators and target metrics, related by a one-to-one correspondence, enables generators to learn with respect to a single metric reliably while improving performance with respect to other metrics by mimicking other generators. Experimental results on speech enhancement and listening enhancement tasks reveal that the proposed method significantly improves performance in terms of multiple metrics compared to existing multi-metric optimization methods. Further, the good performance of MetricGAN-OKD is explained in terms of network generalizability and correlation between metrics.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-shin23b, title = {{M}etric{GAN}-{OKD}: Multi-Metric Optimization of {M}etric{GAN} via Online Knowledge Distillation for Speech Enhancement}, author = {Shin, Wooseok and Lee, Byung Hoon and Kim, Jin Sob and Park, Hyun Joon and Han, Sung Won}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {31521--31538}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/shin23b/shin23b.pdf}, url = {https://proceedings.mlr.press/v202/shin23b.html}, abstract = {In speech enhancement, MetricGAN-based approaches reduce the discrepancy between the $L_p$ loss and evaluation metrics by utilizing a non-differentiable evaluation metric as the objective function. However, optimizing multiple metrics simultaneously remains challenging owing to the problem of confusing gradient directions. In this paper, we propose an effective multi-metric optimization method in MetricGAN via online knowledge distillation—MetricGAN-OKD. MetricGAN-OKD, which consists of multiple generators and target metrics, related by a one-to-one correspondence, enables generators to learn with respect to a single metric reliably while improving performance with respect to other metrics by mimicking other generators. Experimental results on speech enhancement and listening enhancement tasks reveal that the proposed method significantly improves performance in terms of multiple metrics compared to existing multi-metric optimization methods. Further, the good performance of MetricGAN-OKD is explained in terms of network generalizability and correlation between metrics.} }
Endnote
%0 Conference Paper %T MetricGAN-OKD: Multi-Metric Optimization of MetricGAN via Online Knowledge Distillation for Speech Enhancement %A Wooseok Shin %A Byung Hoon Lee %A Jin Sob Kim %A Hyun Joon Park %A Sung Won Han %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-shin23b %I PMLR %P 31521--31538 %U https://proceedings.mlr.press/v202/shin23b.html %V 202 %X In speech enhancement, MetricGAN-based approaches reduce the discrepancy between the $L_p$ loss and evaluation metrics by utilizing a non-differentiable evaluation metric as the objective function. However, optimizing multiple metrics simultaneously remains challenging owing to the problem of confusing gradient directions. In this paper, we propose an effective multi-metric optimization method in MetricGAN via online knowledge distillation—MetricGAN-OKD. MetricGAN-OKD, which consists of multiple generators and target metrics, related by a one-to-one correspondence, enables generators to learn with respect to a single metric reliably while improving performance with respect to other metrics by mimicking other generators. Experimental results on speech enhancement and listening enhancement tasks reveal that the proposed method significantly improves performance in terms of multiple metrics compared to existing multi-metric optimization methods. Further, the good performance of MetricGAN-OKD is explained in terms of network generalizability and correlation between metrics.
APA
Shin, W., Lee, B.H., Kim, J.S., Park, H.J. & Han, S.W.. (2023). MetricGAN-OKD: Multi-Metric Optimization of MetricGAN via Online Knowledge Distillation for Speech Enhancement. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:31521-31538 Available from https://proceedings.mlr.press/v202/shin23b.html.

Related Material